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Summary 


Random  variables  that  stand  for  cost,  loss  or  damage  must  be  confronted  in  numerous  situations. 
Dealing  with  them  systematically  for  purposes  in  risk  management,  optimization  and  statistics  is  the 
theme  of  this  project,  which  brings  together  ideas  coming  from  many  different  areas. 

Measures  of  risk  can  be  used  to  quantify  the  hazard  in  a  random  variable  by  a  single  value  which  can 
substitute  for  the  otherwise  uncertain  outcomes  in  a  formulation  of  constraints  and  objectives.  Such 
quantifications  of  risk  can  be  portrayed  on  a  higher  level  as  generated  from  penalty-type  expressions  of 
“regret”  about  the  mix  of  potential  outcomes.  A  trade-off  between  an  up-front  level  of  hazard  and  the 
uncertain  residual  hazard  underlies  that  derivation.  Regret  is  the  mirror  image  of  utility,  a  familiar 
concept  for  dealing  with  gains  instead  of  losses,  but  regret  concerns  hazards  relative  to  a  benchmark. 
It  bridges  risk  measures  and  expected  utility,  thereby  reconciling  those  two  approaches  to  optimization 
under  uncertainty. 

Statistical  estimation  is  inevitably  a  partner  with  risk  management  in  handling  hazards,  which  may 
be  known  only  partially  through  a  data  base.  However,  a  much  deeper  connection  has  come  to  light 
with  statistical  theory  itself,  in  particular  regression.  Very  general  measures  of  error  can  associate  with 
any  hazard  variable  a  “statistic”  along  with  a  “deviation”  which  quantifies  the  variable’s  nonconstancy. 
Measures  of  deviation,  on  the  other  hand,  are  paired  closely  with  measures  of  risk  exhibiting  “aversity.” 
A  direct  correspondence  can  furthermore  be  identified  between  measures  of  error  and  measures  of 
regret.  The  fundamental  quadrangle  of  risk  developed  here  puts  all  of  this  together  in  a  unified 
scheme. 
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1  Introduction 


The  challenges  of  dealing  with  risk  pervade  many  areas  of  management  and  engineering.  The  decisions 
that  have  to  be  made  in  risky  situations  must  nonetheless  confront  constraints  on  their  consequences, 
no  matter  how  uncertain  those  consequences  may  be.  Furthermore,  the  decisions  need  to  be  open  to 
comparisons  which  enable  some  kind  of  optimization  to  take  place. 

When  uncertainty  is  modeled  probabilistically  with  random  variables,  practical  challenges  arise 
about  estimating  properties  of  those  random  variables  and  their  interrelationships.  Information  may 
come  from  empirical  distributions  generated  by  sampling,  or  there  may  only  be  databases  representing 
information  accumulated  somehow  or  other  in  the  past.  Standard  approaches  to  statistical  analysis 
and  regression  in  terms  of  expectation,  variance  and  covariance  may  then  be  brought  in.  But  the 
prospect  is  now  emerging  of  a  vastly  expanded  array  of  tools  which  can  be  finely  tuned  to  reflect  the 
various  ways  that  risk  may  be  assessed  and,  at  least  to  some  extent,  controlled. 


optimization 


Diagram  1: 


risk  1Z  i — V  deviation 

f|  <S  If  estimation 

regret  V  < — >  £  error 

The  Fundamental  Risk  Quadrangle 


This  project  is  aimed  at  promoting  and  developing  such  tools  in  a  new  paradigm  we  call  the  risk 
quadrangle,  which  is  shown  in  Diagram  1.  It  brings  together  several  lines  of  research  and  methodology 
which,  until  now,  been  pursued  separately  in  different  professional  areas  with  little  inkling  of  their 
fertile  interplay.  The  ideas  in  these  areas  form  such  a  vast  subject  that  a  broad  survey  with  full 
references  is  beyond  feasibility.  Our  contribution  here  must,  in  part,  be  seen  therefore  as  providing 
an  overview  of  the  connections,  supplemented  by  instructive  examples  and  the  identification  of  issues 
in  need  of  more  attention.  However,  many  new  facts  are  brought  to  light  along  with  new  results  and 
broad  extensions  of  earlier  results. 


1Z{X)  provides  a  numerical  surrogate  for  the  overall  hazard  in  X , 
V(X)  measures  the  “nonconstancy”  in  X  as  its  uncertainty, 

£(X)  measures  the  “nonzeroness”  in  X, 

V(X)  measures  the  “regret”  in  facing  the  mix  of  outcomes  of  X, 
S(X)  is  the  “statistic”  associated  with  X  through  £  and  V. 

Diagram  2:  The  Quantifications  in  the  Quadrangle. 


The  context  is  that  of  random  variables  that  can  be  thought  of  as  standing  for  uncertain  “costs” 
or  “losses”  in  the  broadest  sense,  not  necessarily  monetary  (with  a  negative  “cost”  corresponding 
perhaps  to  a  “reward”).  The  language  of  cost  gives  the  orientation  that  we  would  like  the  outcomes 
of  these  random  variables  to  be  lower  rather  than  higher,  or  to  be  held  below  some  threshold.  All 
sorts  of  indicators  that  may  provide  signals  about  hazards  can  be  viewed  from  this  perspective.  The 
quadrangle  elements  provide  numerical  “quantifications”  of  them  (not  only  finite  numbers  but  in  some 
cases  oo)  which  can  be  employed  for  various  purposes. 
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It  will  help,  in  understanding  the  quadrangle,  to  begin  at  the  upper  left  corner,  where  TZ  is  a  so- 
called  measure  of  risk.  The  specific  sense  of  this  needs  clarification,  since  there  are  conflicting  angles 
to  the  meaning  of  “risk.”  In  denoting  a  random  cost  by  X  and  a  constant  by  C,  a  key  question  is 
how  to  give  meaning  to  a  statement  that  X  is  “adequately”  <  C  with  respect  to  the  preferences  of 
a  decision  maker  who  realizes  that  uncertainty  might  inescapably  generate  some  outcomes  of  X  that 
are  >  C.  The  role  of  a  risk  measure  TZ ,  in  the  sense  intended  here,  is  to  answer  this  question  by 
aggregating  the  overall  uncertain  cost  in  X  into  a  single  numerical  value  TZ{X)  in  order  to 

model  UX  adequately  <  C”  by  the  inequality  TZ{X)  <  C. 

There  are  familiar  ways  of  doing  this.  One  version  could  be  that  X  is  <  C  on  average,  as 
symbolized  by  n{X)  <  C  with  n(X)  the  mean  value,  or  in  equivalent  notation  (both  are  convenient 
to  maintain),  EX  <  C  with  EX  the  expected  value.  Then  7 Z(X)  =  n(X )  =  EX.  A  tighter  version 
could  be  n(X)  +  \a(X)  <  C  with  A  giving  a  positive  multiple  of  the  standard  deviation  cr(X)  so  as 
to  provide  a  safety  margin  reminiscent  of  a  confidence  level  in  statistics;  then  7 Z(X)  =  fi(X)  +  \a(X). 
The  alternative  idea  that  the  inequality  should  hold  at  least  with  a  certain  probability  a  £  (0,1) 
corresponds  to  qa(X )  <  C  with  qa{X )  denoting  the  a-quantile  of  X,  whereas  insisting  that  X  <  C 
almost  surely  can  be  written  as  sup  X  <  C  with  sup  X  standing  for  the  essential  supremum  of  X.  Then 
7 Z(X)  =  qa{X)  or  1Z{X)  =  sup  X,  respectively.1  However,  these  examples  are  just  initial  possibilities 
among  many  for  which  pros  and  cons  need  to  be  appreciated. 

A  typical  situation  in  optimization  that  illustrates  the  compelling  need  for  measures  of  risk  revolves 
around  a  family  of  random  “costs”  that  depend  on  a  decision  vector  x  belonging  to  a  subset  S  C  TRn, 

Xi(x )  for  i  =  0, 1, . . , ,  m,  where  x  =  (xi, ...,  xn).  (1-1) 

The  handicap  is  that  x  can  usually  do  no  more  than  influence  the  probability  distribution  of  each 
of  the  “costs.”  A  potential  aim  in  choosing  x  from  S  would  be  to  keep  the  random  variable  Xi(x) 
adequately  <  c*  for  i  =  1, . . . ,  m,  while  achieving  the  lowest  Co  such  that  Xq(x)  is  adequately  <  cq.  The 
way  “adequately”  is  modeled  could  be  different  for  each  i,  and  the  notion  of  a  risk  measure  provides 
the  perfect  tool.  A  selection  of  risk  measures  IZi  that  pins  down  the  intended  sense  of  “adequately” 
in  each  case  leads  a  optimization  problem  having  the  form 

choose  x  £  S  to  minimize  7Zo(Xo(x))  subject  to  TZi(Xi(x))  <  Ci  for  i  =  1, . . .  ,  m.  (1-2) 

Besides  pointing  the  way  toward  risk-oriented  problem  formulations  to  which  optimization  technology 
can  successfully  be  applied,  this  illustration  brings  another  issue  to  the  foreground.  In  selecting  a 
measure  of  risk  IZi,  it  may  not  be  enough  just  to  rely  on  IZi  having  an  appealing  interpretation.  An 
important  consideration  may  be  whether  IZi  produces  expressions  TZi(Xifx))  that  behave  reasonably 
as  functions  of  x  =  (x\ , . . . ,  xn).  Axioms  laying  out  sensible  standards  for  a  measure  of  risk,  such  the 
coherency  introduced  in  Artzner  et  al.  [1999],  are  vital  for  that.2 

Another  idea  in  dealing  with  uncertainty  in  a  random  variable  X  is  to  quantify  its  nonconstancy 
through  a  measure  of  deviation  V,  with  T>(X)  then  being  a  generalization  of  cr{X).  Again,  axioms 
have  to  be  articulated.  The  distinction  between  V  and  1Z  at  the  top  of  the  quadrangle  is  essential, 
despite  a  very  close  connection,  because  of  differences  in  axioms  and  roles  played  in  applications. 

Motivation  for  nonstandard  measures  of  deviation  is  apparent  in  particular  in  finance  because  of  the 
heavy  concentration  there  on  variance — or  equivalently  standard  deviation — despite  shortcomings  in 

xNote  that  TZ(X)  =  sup.Y  gives  examples  where  TZ(X)  might  be  oo. 

2The  axioms  will  be  developed  in  Section  3  and  their  consequences  for  optimization  problems  like  (1.2)  fully  pinned 
down  in  Section  5. 
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capturing  dangerous  “tail  behavior”  in  probability  distributions.  In  portfolio  theory,  the  rate  of  return 
of  the  portfolio  is  a  random  variable  X  (x)  depending  on  the  vector  x  that  gives  the  proportions  of 
various  securities  included  in  the  portfolio.  Bounds  are  placed  on  a(X(x))  or  this  quantity  is  minimized 
subject  to  side  conditions  on  x.  Such  an  approach  can  be  justified  when  the  random  variables  have 
normal  distributions,  but  when  the  heavy  tail  behavior  of  nonnormal  distributions  enters  the  scene, 
doubts  arise.  It  may  be  better  then  to  replace  standard  deviation  by  a  different  deviation  measure, 
which  perhaps  could  even  act  on  X(x)  asymmetrically.3 

The  introduction  of  nonstandard  deviation  measures  V  in  place  of  a  brings  up  the  question  of 
whether  this  might  entail  some  kind  of  generalization  in  statistical  theory  itself.  That  is  indeed  one 
of  the  questions  our  quadrangle  scheme  is  aimed  at  answering,  as  will  be  explained  shortly.4 

We  turn  now  to  the  lower  left  corner  of  the  quadrangle.  In  a  measure  of  regret  V,  the  value  V(X) 
stands  for  the  net  displeasure  perceived  in  the  potential  mix  of  outcomes  of  a  random  “cost”  X  which 
may  sometimes  be  >  0  (bad)  and  sometimes  <  0  (OK  or  better).56  Regret  comes  up  in  penalty 
approaches  to  constraints  in  stochastic  optimization  and,  in  mirror  image,  corresponds  to  measures  of 
“utility”  U  in  a  context  of  gains  Y  instead  of  losses  X,  which  is  typical  in  economics:  V(X)  =  —U(—X), 
U{Y )  =  —  V(— Y).  Regret  obeys  V(0)  =  0,  so  in  this  pairing  we  have  to  focus  on  utility  measures 
that  have  U( 0)  =  0;  we  say  then  that  U  is  a  measure  of  relative  utility.  The  interpretation  is  that,  in 
applying  U  to  Y .  we  are  thinking  of  Y  not  as  absolute  gain  but  gain  relative  to  some  threshold,  e.g., 
Y  =  Yq  —  B  where  Yo  is  absolute  gain  and  B  is  a  benchmark. 

Focusing  on  relative  utility  in  this  sense  is  a  positive  feature  of  the  quadrangle  scheme  because 
it  can  help  to  capture  the  sharp  difference  in  attitude  toward  outcomes  above  or  below  a  benchmark 
that  is  increasingly  acknowledged  as  influencing  the  preferences  of  decision  makers.7 

Measures  of  regret  V,  like  measures  of  deviation  V,  are  profoundly  related  to  measures  of  risk  7 Z, 
and  one  of  our  tasks  will  to  bring  this  all  out.  Especially  important  will  be  a  one-to-one  correspondence 
between  measures  of  deviation  and  measures  of  risk  under  “aversity,”  regardless  of  coherency.  A 
powerful  property  of  measures  of  regret,  which  soon  will  be  discussed,  is  their  ability  to  generate 
measures  of  risk  through  trade-off  formulas.  By  means  of  such  formulas,  an  optimization  problem  in 
the  form  of  (1.2)  may  be  recast  in  terms  of  regret  instead  of  risk,  and  this  can  be  a  great  simplification.8 

Furthermore,  by  revealing  a  deep  connection  between  risk  measures  and  utilty,  regret  reconciles 
the  seemingly  different  approaches  to  optimization  based  on  those  concepts. 

The  interesting  question  already  raised,  of  whether  measures  of  deviation  beyond  standard  devia¬ 
tion  might  fit  into  some  larger  development  in  statistical  theory,  is  our  next  topic.  It  brings  us  to  the 
lower  right  corner  of  the  quadrangle,  where  we  speak  of  a  measure  of  error  £  as  assigning  to  a  random 
variable  X  a  value  £(X)  that  quantifies  the  nonzeroness  in  X.  Classical  examples  are  the  ZT-norms 

\\X\h  =  E\X\,  \\X\\P  =  [E(\X\p)]1/p  for  (l,oo),  H^oo  =  sup  \X\,  (1.3) 

but  there  is  much  more  to  think  of  besides  norms. 

3The  “two  fund  theorem”  and  other  celebrated  results  of  portfolio  theory  that  revolve  around  standard  deviation  can 
be  extended  in  this  direction  with  CAPM  equations  replaced  by  other  equations  derived  from  alternative  measures  of 
deviation;  cf.  Rockafellar  et  al.  [2006b]. 

4Nonstandard  deviation  measures  are  also  connected  to  statistics  through  entropy  analysis,  cf.  Grechuk  et  al.  [2008]. 

'Regret  in  this  sense  is  distinct  from  the  notion  of  regret  as  “opportunity  loss”  in  some  versions  of  decision  theory. 

6In  financial  terms,  if  X  and  V(X)  have  units  of  money,  V(X)  can  be  the  compensation  deemed  appropriate  for  taking 
on  the  burden  of  the  uncertain  loss  A'. 

'This  will  be  discussed  in  more  detail  in  Section  4  in  the  case  of  utility  expressions  U{Y)  =  E[u(Y)\  for  an  underlying 
function  u.  Having  W(0)  =  0  corresponds  to  having  u(0)  =  0,  which  can  be  achieved  by  selecting  a  benchmark  and 
shifting  the  graph  of  a  given  “absolute”  utility  so  that  benchmark  point  is  at  the  origin  of  JR2. 

8This  is  mission  of  the  Regret  Theorem  in  Section  5. 
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Given  an  error  measure  £  and  a  random  variable  X,  one  can  look  for  a  constant  C  nearest  to  X 
in  the  sense  of  minimizing  £{X  —  C).  The  resulting  minimum  “^’-distance,”  denoted  by  T>(X),  turns 
out  to  be  a  deviation  measure  (under  assumptions  explained  later).  The  C  value  in  the  minimum, 
denoted  by  S(X),  can  be  called  the  “statistic”  associated  with  X  by  £.  The  case  of  £{X)  =  |  |-X”|  1 2 
produces  S(X)  =  EX  and  T>(X)  =  <j(X),  but  many  other  examples  will  soon  be  seen. 

The  emergence  of  a  particular  deviation  measure  T>  and  statistic  S  from  the  choice  of  an  error 
measure  £  has  intriguing  implications  for  statistical  estimation  in  the  sense  of  generalized  regres¬ 
sion.  There  is  furthermore  a  deep  connection  between  regression  and  an  optimization  problem  like 
(1.2).  The  x-dependent  random  variables  Xt(x)  there  might  be  replaced  by  convenient  approxima¬ 
tions  Xj  (x)  developed  through  regression,  and  the  particular  mode  of  regression  might  have  significant 
consequences.  We  will  get  back  to  this  shortly. 

The  optimization  and  estimation  sides  of  the  quadrangle  are  bound  together  not  only  through  such 
considerations,  but  also  in  a  more  direct  manner.  The  rule  that  projects  from  £  onto  V  is  echoed 
by  a  certainty- uncertainty  trade-off  formula  which  projects  a  regret  measure  V  onto  a  risk  measure 
1Z.  This  formula,  in  which  C  +  V(X  —  C)  is  minimized  over  C ,  generalizes  a  rule  in  Rockafellar  and 
Uryasev  [2000],  Rockafellar  and  Uryasev  [2002],  for  VaR-CVaR  computations.  It  extends  the  insights 
gained  beyond  that  by  Ben-Tal  and  Teboulle  [2007]  in  a  context  of  expected  utility,  and  lines  up  with 
still  broader  expressions  for  risk  in  Krokhmal  [2007].  Under  a  simple  relationship  between  V  and  £ . 
the  optimal  C  value  in  the  trade-off  is  the  same  statistic  S(X)  as  earlier,  but  that  conceptual  bond 
has  been  missed.  Nothing  has  hitherto  suggested  that  “error”  in  its  context  of  approximation  might 
be  inherently  related  to  the  very  different  concept  of  “regret”  and,  through  that,  to  “utility.” 

Altogether,  we  arrive  in  this  way  at  a  “quadrangle”  of  quantifications  having  the  descriptions  in 
Diagram  2  and  the  interconnections  in  Diagram  3. 9  More  details  will  be  furnished  in  Section  3,  after 
the  assumptions  needed  to  justify  the  relationships  have  been  explained. 


K(X)  =  EX  +  V(X),  V(X)  =  K{X)  -  EX 

V{X)  =  EX  +  £(X),  £(X)  =  V(X)  —  EX 

n{X)  =  min{  C  +  V(X  -  C)  },  V(X)  =  min{  £(X  -  C)  } 
c  c 

argmin{  C  +  V{X  —  C)  }  =  S{X)  =  argmin{  £(X  —  C)  } 
c  c 

Diagram  3:  The  General  Relationships 


The  paired  arrows  on  the  sides  of  Diagram  1,  in  contrast  to  the  two-way  arrows  on  the  top  and 
bottom,  correspond  to  the  fact  that  the  simple  formulas  in  Diagram  3  for  getting  7 Z  and  V  from  V  and 
£  are  not  uniquely  invertible.  Antecedents  V  and  £  for  1Z  and  T>  always  exist,  even  in  multiplicity,10 
so  the  real  issue  for  inversion  is  the  identification  of  natural,  nontrivial  antecedents.  That  is  a  large 
topic  with  many  good  answers  in  the  examples  in  Section  2  and  broader  principles  in  Section  3. 

More  must  be  said  now  about  how  the  quadrangle  relates  to  statistical  estimation  in  the  form 
of  regression,  and  the  motivation  coming  from  that.  Broader  approaches  to  regression  than  classical 
“least-squares”  are  not  new,  but  the  description  to  be  given  here  is  unprecedently  broad. 

9The  “argmin”  notation  refers  to  the  C  values  that  achieve  the  “min.” 

10For  instance  V(X)  =  TZ(X)  +  a\EX\  and  £{X)  =  V(X)  +  a\EX\  with  a  >  0. 
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Regression  is  a  way  of  approximating  a  random  variable  Y  by  a  function  f(X i, . . . ,  Xn)  of  one  or 
more  other  random  variables  Xj  for  purposes  of  anticipating  outcome  properties  or  trends.  It  requires 
a  way  of  measuring  how  far  the  random  difference  Zf  =  Y  —  f(X i, . . .  ,Xn )  is  from  0.  That,  clearly, 
is  where  error  measures  £  can  come  in.  The  norms  (1.3)  offer  choices,  but  there  may  be  incentive 
for  using  asymmetric  error  measures  £  that  look  at  more  than  just  \Zf\.  When  Y  has  cost  or  hazard 
orientation,  underestimations  Y  —  f(X i, . . .  ,Xn )  >  0  may  be  more  dangerous  than  overestimations 
Y-f(Xu...,Xn)<  0. 

For  an  error  measure  £  and  a  collection  C  of  regression  functions  /,  the  basic  problem  of  regression 
for  Y  with  respect  to  Xi, . . . ,  Xn  is  to 

minimize  £{Zf)  over  /  E  C,  where  Zf  =  Y  —  f(X i, . . .  ,Xn).  (1-4) 

An  immediate  question  that  comes  to  mind  is  how  one  such  version  of  regression  might  differ  from 
another  and  perhaps  be  better  for  some  purpose.  We  provide  a  simple  but  revealing  answer.  As  long 
as  C  has  the  property  that  it  includes  with  each  /  all  the  translates  f  +  C  for  constants  C,  problem 
(1.4)  has  the  following  interpretation: 

minimize  V{Zf)  over  all  /  G  C  such  that  S(Zf)  =  0,  (1.5) 

where  T>  and  S  are  the  deviation  measure  and  statistic  associated  with  the  error  measure  £.  In  such 
generality,  and  with  additional  features  as  well,11  this  is  a  new  result,  but  it  builds  in  part  on  our 
earlier  theorem  in  Rockafellar  et  al.  [2008]  for  the  case  of  linear  regression  functions  /. 

Factor  models  for  simplifying  work  with  random  variables  ordinarily  rely  on  standard  least-squares 
regression,  which  corresponds  here  to  £  being  the  £?  norm  in  (1.3),  so  that  T>(Zf)  is  cr(Zf)  and  S(Zf ) 
is  n(Zf).  Suppose,  for  instance,  that  the  “costs”  in  (1.1)  have  the  form 

Xi(x)  =  gi(x,  Vi, ... ,  Vr)  with  respect  to  random  variables  14.  (1.6) 

The  random  variables  14  may  have  various  interdependencies  which  can  be  treated  by  thinking  of 
them  as  reflecting  the  outcomes  of  certain  other,  more  “primitive,”  random  variables  IV \ , . . . ,  Ws . 
This  can  suggest  approximating  them  through  regression  as 

14  ~  Vk  =  fk(Wi, . . . ,  Ws)  for  fk  €  Ck  and  an  error  measure  £k,  (1.7) 

which  leads  to  approximating  A,;(x)  by 

Xi(x)  =  gi(x,  Vi,---,  Vr)  for  i  =  0,l,...,m.  (1.8) 

In  the  optimization  problem  (1.2),  this  replaces  the  objective  and  constraint  functions  lZi(Xi(x))  by 
different  functions  7 Zi(Xi(x)).  How  will  that  change  the  solution?  What  guarantee  is  there  that  a 
solution  to  the  altered  problem  will  be  close  to  a  solution  to  the  original  problem? 

That  question  has  received  very  little  attention  so  far,  although  we  raised  it  in  Rockafellar  et  al. 
[2008]  as  suggesting  that  the  error  measures  £k  in  (1.7)  should  be  “tuned”  somehow  to  the  quantifica¬ 
tion  of  risk  by  77-j.  We  did  show  there,  at  least,  that  if  gi(x,  V\, . . . ,  Vm)  =  x\V\  +  •  •  •  xmVm,  the  £^s 
should  be  the  error  measure  in  the  same  quadrangle  as  the  risk  measure  1Z1.  Then  the  expressions 
7 Zi(Xi(x))  and  7 Zi(Xi(x))  will  be  closer  to  each  other  as  functions  of  x  than  otherwise.  Although  we 

11See  the  Regression  Theorem  in  Section  5.  In  general,  S  can  assign  an  interval  of  values,  so  the  constraint  in  (1.5) 
would  better  be  written  as  0  €  S(Zf). 
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do  not  pursue  that  further  in  this  survey,  we  hope  that  the  quadrangle  framework  we  furnish  will  help 
to  stimulate  more  research  on  the  matter. 

In  the  plan  of  the  project  after  this  introduction,  we  will  first  pass  in  Section  2  to  examples  of 
quadrangles  that  help  to  underscore  our  intentions  and  provide  guidance  for  theory  and  applications. 
This  is  a  compromise  in  which  we,  and  the  reader,  are  held  back  to  some  extent  by  the  postponement 
of  precise  definitions  and  assumptions  that  only  come  in  Section  3.  It  is  an  unusual  way  of  proceeding, 
but  we  take  this  path  from  the  conviction  that  providing  motivation  in  advance  of  technical  details  is 
essential  for  conveying  the  attractions  of  this  wide-ranging  subject. 

Section  3  showcases  the  Quadrangle  Theorem  which  supports  the  formulas  and  relationships  in 
Diagrams  1,  2,  and  3  and  specifies  the  key  properties  of  the  quantifiers  1Z.  V,  V  and  £  that  propagate 
through  the  scheme.  Although  some  connections  have  already  been  indicated  elsewhere,  this  result  is 
new  in  its  generality  and  creation  of  the  entire  quadrangle  with  V  and  its  associated  utility  U.  Also 
new  in  Section  3  in  similar  degree  are  the  Scaling  Theorem,  the  Mixing  Theorem  and  the  Reverting 
Theorem,  which  furnish  means  of  constructing  additional  instances  of  quadrangles  from  known  ones. 

Interpretations  and  results  beyond  the  basics  in  Section  3  are  provided  in  Section  4  as  an  aid  to 
more  specialized  applications.  The  main  contribution  there  is  the  Expectation  Theorem,  concerned 
with  the  “expectation  quadrangles”  we  are  about  to  describe.  In  particular,  it  enables  us  to  justify 
a  number  of  the  examples  in  Section  2  and  show  how  they  can  be  extended.  Section  5  presents 
in  more  detail  the  role  of  the  risk  quadrangle  in  applications  to  optimization,  as  in  problem  (1.2), 
and  generalized  regression  as  in  problem  (1.4). 12  The  Convexity  Theorem  indicates  how  “convex 
dependence”  of  the  random  variables  Xi(x )  in  (1.1)  with  respect  to  x  passes  over  to  convexity  of  the 
expressions  lZi(Xi(x))  in  (1.2)  under  natural  assumptions  on  IZi.  The  Regret  Theorem  provides  a  far- 
reaching  new  generalization  of  a  well  known  device  from  Rockafellar  and  Uryasev  [2002]  for  facilitating 
the  solution  of  optimization  problems  (1.2)  when  TZ{  is  a  CVaR  risk  measure.  The  Regression  Theorem 
in  Section  5  handles  problem  (1.4)  on  level  beyond  anything  previously  attempted. 

Duality  will  occupy  our  attention  in  Section  6.  Each  of  the  quantifiers  7Z,  V,  £,  V,  has  a  dual 
expression  in  the  presence  of  “closed  convexity,”  a  property  we  will  build  into  them  in  Section  3.  This  is 
presented  in  the  Envelope  Theorem.  Such  dualizations  shed  additional  light  on  modeling  motivations. 
Although  the  dualization  of  a  risk  measure  1Z  has  already  been  closely  investigated,  its  advantageous 
coordination  with  the  dualization  of  V  is  new  here  together  with  its  echoes  in  V  and  £. 

Expectation  Quadrangles.  Many  examples,  but  by  no  means  all,  will  fall  into  the  category  that 
we  call  the  expectation  case  of  the  risk  quadrangle.  The  special  feature  in  this  case  is  that 


£(X)  =  E[e(X)],  V(X)  =  E[v(X)],  U(Y)  =  E[u(Y)],  (1.9) 

for  functions  e  and  v  on  (—00,00)  related  to  each  other  by 

e(x)  =  v(x)  —  x,  v(x)  =  e(x)  +  x,  (1.10) 

and  on  the  other  hand,  v  corresponding  to  relative  utility  u  through 

v(x)  =  -u(-x),  u(y)  =  -v(-y).  (1.11) 

The  V  <->■  £  correspondence  in  Diagram  3  holds  under  (1.9),  while  (1.10)  ensures  that  V(X)  =  —U(—X) 


and  U{Y)  =  —V(—Y).  The  consequences  for  the  S ,  1Z  and  V  components  of  the  quadrangle,  as 
generated  by  the  other  formulas  in  Diagram  3,  will  be  discussed  in  Section  4. 

12Other  issues  in  statistical  “estimation,”  such  as  the  convergence  of  approximations  based  on  sampling  are  not  taken 
up  here  despite  their  great  importance  in  the  long  run.  This  is  due  to  the  lack  of  space  and,  in  some  cases,  the  imperfect 
state  of  current  knowledge.  Interesting  research  challenges  abound. 
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Expected  utility  is  a  central  notion  in  decision  analysis  in  economics  and  likewise  in  finance,  cf. 
Follmer  and  Schied  [2004],  Expected  error  expressions  similarly  dominate  much  of  statistics,  cf. 
Gneiting  [2011].  Expectation  quadrangles  provide  the  connection  to  those  bodies  of  theory  in  the 
development  undertaken  here.  However,  the  quadrangle  scheme  also  reveals  serious  limitations  of  the 
expectation  case.  Many  attractive  examples  do  not  fit  into  it,  as  will  be  clear  in  the  sampling  of 
Section  2.  Even  expressions  U (Y)  =  E[uq(Y  —  1q)],  giving  expected  ito-utility  relative  to  a  benchmark 
gain  Y0,  can  fail  to  be  directly  representable  as  U(Y)  =  E[u(Y)\  for  a  utility  function  u.  Departure 
from  expected  utility  and  expected  error  is  therefore  inevitable,  if  the  quadrangle  relationships  we  are 
exploring  are  to  reach  their  full  potential  for  application.  This  widening  of  perspective  is  another  of 
the  contributions  we  are  aiming  at  here. 

2  Some  Examples  Showing  the  Breadth  of  the  Scheme 

Before  going  into  technical  details,  we  will  look  at  an  array  of  examples  aimed  at  illustrating  the  scope 
and  richness  of  the  quadrangle  scheme  and  the  interrelationships  it  reveals.  In  each  case  the  elements 
correspond  to  each  other  in  the  manner  of  Diagram  3.  Some  of  the  connections  are  already  known 
but  have  not  all  been  placed  in  a  single,  comprehensive  picture. 

The  first  example  ties  classical  safety  margins  in  the  risk  measure  format  in  optimization  and 
reliability  engineering  to  the  standard  tools  of  least-squares  regression.  It  centers  on  the  mean  value 
of  X  as  the  statistic.  The  scaling  factor  A  >  0  allows  the  safety  margin  to  come  into  full  play:  having 
X  adequately  <  C  is  interpreted  as  having  n(X)  at  least  A  standard  deviation  units  <  C.  Through 
“regret”  a  link  is  made  to  an  associated  “utility.”  However,  as  will  be  seen  in  Section  3,  this  quadrangle 
lacks  an  important  property  of  “coherency.” 

Example  1:  A  Mean-Based  Quadrangle  (with  A  >  0  as  a  scaling  parameter) 

5(A)  =  EX  =  n{X)  =  mean 
1Z(X)  =  i-i(X)  +  Acr(A)  =  safety  margin  tail  risk 
T>(X)  =  A  <j(X)  =  standard  deviation,  scaled 
V(A)  =  n(X)  +  A||X||2  =  T2-regret,  scaled 
5(A)  =  A||A||2  =  L2-error,  scaled 

Regression  with  this  5  corresponds  through  (1.5)  to  minimizing  the  standard  deviation  of  the  error 
Zf  =  Y  —  /(A i, . . . ,  Xn)  subject  to  the  mean  of  the  error  being  0. 

Already  here  we  have  an  example  that  is  not  an  expectatation  quadrangle.  Perhaps  that  may  seem 
a  bit  artificial,  because  the  C2- norm  could  be  replaced  by  its  square.  That  would  produce  a  modified 
quadrangle  giving  the  same  statistic: 

Example  1':  Variance  Version  of  Example  1 

5(A)  =  EX  =  n(  X) 

K(  X)  =  n{X)  +  Xa2{X) 

V(X)  =  \a2(X) 

V(A)  =  /u(A)  +  A| | A| |2  =  E[v(X)\  for  v(x )  =  x  +  \x2 
£(X)  =  A||A||2  =  E[e{x)]  for  e(x)  =  \x2 

8 


However,  some  properties  would  definitely  change.  The  first  version  has  7Z(X  -\-X')  <  77(A)  +  TZ{X'), 
which  is  a  rule  often  promoted  for  measures  of  risk  as  part  of  “coherency”  (as  explained  in  Section  3), 
but  this  fails  for  the  second  version  (although  “convexity”  persists).  A  new  quadrangle  variant  of 
Examples  1  and  V  with  potentially  important  advantages  will  be  introduced  in  Example  7. 

The  next  example  combines  quantile  statistics  with  concepts  coming  from  risk  management  in 
finance  and  engineering.  By  tying  “conditional  value-at-risk,”  on  the  optimization  side,  to  quantile 
regression  (in  contrast  to  least-squares  regression)  as  pioneered  in  statistics  by  Koenker  and  Bassett 
[1978],  it  underscores  a  unity  that  might  go  unrecognized  without  the  risk  quadrangle  scheme. 

The  key  in  this  case  is  provided  by  the  (cumulative)  distribution  function  Fx(x)  =  prob{  X  <  x  } 
of  a  random  variable  X  and  the  quantile  values  associated  with  it.  If,  for  a  probability  level  a  £  (0, 1), 
there  is  a  unique  x  such  that  Fx(x)  =  a,  that  x  is  the  a-quantile  qa(X).  In  general,  however,  there 
are  two  values  to  consider  as  extremes: 

Qa(X)  =  inf  {  x  |  Fx(x)  >  a  },  g"(A)  =  sup {  x  \  Fx(x)  <  a  }.  (2.1) 

It  is  customary,  when  these  differ,  to  take  the  lower  value  as  “the”  a-quantile,  noting  that,  because 
Fx  is  right-continuous,  this  is  the  lowest  x  such  that  Fx(x)  =  a.  Here,  instead,  we  will  consider  the 
entire  interval  between  the  two  competing  values  as  the  quantile, 

qa(X)  =  [q-(X),q+(X)\,  (2.2) 

bearing  in  mind  that  this  interval  usually  collapses  to  a  single  value.  That  approach  will  fit  better 
with  our  way  of  defining  a  “statistic”  by  the  argmin  notation.  Also  important  to  understand,  in 
our  context  of  interpreting  X  as  a  “cost”  or  “loss,”  is  that  the  notion  of  value-at-risk  in  finance 
coincides  with  quantile.  There  is  an  upper  value-at-risk  VaR+(X)  =  g+(A)  along  with  a  lower  value- 
at-risk  VaR“(X)  =  q~(X),  and,  in  general,  a  value-at-risk  interval  VaRa(X)  =  [VaR+(X),  VaR“(A)] 
identical  to  the  quantile  interval  qa(X). 

Besides  value-at-risk,  the  example  coming  under  consideration  involves  the  conditional  value-at-risk 
of  X  at  level  a  £  (0, 1)  as  defined  by 

CVaRo,(X)  =  expectation  of  X  in  its  a-tail,  (2.3) 

which  is  also  expressible  by 

1  f1 

CVaRa(X)  =  -  /  VaRT(A)dr.  (2.4) 

1  —  ala 

The  second  formula  is  due  to  Acerbi  [2002]  in  different  terminology,  while  the  first  follows  the  pattern 
in  Rockafellar  and  Uryasev  [2000],  where  “conditional  value-at-risk”  was  coined.13  Due  to  applications 
of  risk  theory  in  areas  outside  of  finance,  such  as  reliability  engineering,  we  believe  it  is  advantageous 
to  maintain,  parallel  to  value-at-risk  and  quantile,  the  ability  to  refer  to  the  conditional  value-at-risk 
CVaRQ(A)  equally  as  the  superquantile  qa(X).  We  will  be  helped  here  and  later  by  the  notation 

X  =  X+  —  X_  with  X+  =  max{0,  X},  X_  =  max{0,  —  X}. 


13The  a-tail  distribution  of  -Y  corresponds  to  the  upper  part  of  the  distribution  of  A'  having  probability  1  —  a.  The 
interpretation  of  this  for  the  case  when  Fx  has  a  jump  at  the  a  quantile  is  worked  out  in  Rockafellar  and  Uryasev  [2002]. 
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Example  2:  A  Quantile- Based  Quadrangle  (at  any  confidence  level  a  G  (0, 1)) 

S(X)  =  VaRa(A)  =  qa(X)  =  quantile 
1Z(X)  =  CVaRa(X)  =  qa(X)  =  superquantile 

V(X)  =  CVaRQ(A  —  EX)  =  qa(X  —  EX)  =  superquantile-deviation 
V(A)  =  j^EX+  =  average  absolute  loss,  scaled14 
£(X)  =  E[j^X+  +  X_\  =  normalized  Koenker-Bassett  error 
This  is  an  expectation  quadrangle  with 

,  .  a  .  r  .  .  ,  1  r  .  ,  ,  1  r  . 

e(x)  =  - max{0,  x\  +  max{0,  —  x},  v(x)  =  - max{0,  x\,  u(y )  =  - mm{0,  y\. 

1  —  a  1  —  a  1  —  a 

The  original  Koenker-Bassett  error  expression  differs  from  the  one  here  by  a  positive  factor.  Adjust¬ 
ment  is  needed  to  make  it  project  to  the  desired  V.  With  respect  to  this  measure  of  error,  regression 
has  the  interpretation  in  (1.5)  that  the  a-superquantile  (or  a-CVaR)  deviation  of  Zf  is  minimized 
subject  to  the  a-quantile  of  Zf  being  0. 

The  targeting  of  average  loss  as  the  source  of  “regret”  in  Example  2  is  interesting  because  of  the 
role  that  average  loss  has  long  had  in  stochastic  optimization,  but  also  through  the  scaling  feature.  In 
the  past,  such  scaling  might  have  been  thought  immaterial,  but  this  quadrangle  shows  that  it  identifies 
a  particular  loss  quantile  having  a  special  role. 

Example  2  confirms  the  motivations  in  Section  1  for  looking  at  entire  quadrangles.  Consider  a 
stochastic  optimization  problem  in  the  form  of  (1.2).  It  is  tempting,  and  common  in  many  applications, 
to  contemplate  taking  1Z%  to  be  a  quantile  q0i .  The  constraint  7 Zi(Xi(x))  <  Ci  would  require  then 
that  x  be  chosen  so  that  the  random  “cost”  X,;(x')  is  <  Cj  with  probability  at  least  at.  However, 
this  apparently  natural  approach  suffers  from  the  fact  that  qai(Xi(x))  may  be  poorly  behaved  as  a 
function  of  x  as  well  as  subject  to  the  indeterminacy,  or  discontinuity,  associated  with  (2.2).  That 
could  hamper  computation  and  lead  to  instability  of  solutions. 

An  alternative  to  a  quantile  would  be  to  take  IZi  to  be  a  superquantile  q0i .  The  constraint 
7 Zi(Xi(x))  <  Ci,  as  an  expression  of  Xi(x)  being  “adequately”  <  Cj,  is  then  more  conservative  and 
has  an  interpretation  in  terms  of  “buffered  probability  of  failure,”  cf.  Rockafellar  and  Royset  [2010]. 
Moreover  it  is  better  behaved  and  able  to  preserve  convexity  of  Xt(x)  with  respect  to  x,  if  present.  A 
further  advantage  in  optimization  from  such  an  approach  is  seen  from  the  projection  from  V  to  1Z  on 
the  left  side  of  the  quadrangle: 

qa  (Xi(x))  <  C{  Ci  H - Mmax{0,  Xi{x)  —  Ci}]  <  Ci  for  a  choice  of  Ci  G  M. 

1  ~OCi 

Thus,  a  superquantile  (or  CVaR)  constraint  can  be  reformulated  as  something  simpler  through  the 
introduction  of  another  decision  variable  Ci  alongside  of  x.15  In  some  situations  the  expectation 
term  in  the  reformulation  can  even  be  handled  through  linear  programming.  This  first  came  out  in 
Rockafellar  and  Uryasev  [2000],  but  the  point  to  be  emphasized  here  is  that  such  a  device  is  not  limited 
to  superquantiles.  The  same  effect  can  be  achieved  with  a  risk  measure  IZi  and  regret  measure  Vi  pair 
from  any  quadrangle  (with  “regularity”),  replacing  7 Zi(Xi(x))  <  Ci  by  Ci  +  V,:(Aj(x)  —  Ct)  <  Ci,  and 
the  variable  C\  ends  up  then  in  optimality  as  St(Xt(x))',  see  the  Regret  Theorem  in  Section  5. 

14Average  absolute  loss  as  “regret,”  and  as  inspiration  for  the  terminology  we  are  introducing  here  more  broadly,  goes 
back  to  Dembo  and  King  [1992]  in  stochastic  programming. 

lsMinimizing  qao(Xo{x))  in  x  converts  likewise  to  minimizing  Co  +  1_1Q0 -E[max{0, Xq(x)  —  Co}]  in  x  and  Co- 
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The  fact  that  the  T>-£  side  of  the  quadrangle  in  Example  2  corresponds  to  quantile  regression  has 
important  implications  as  well.  It  was  explained  in  Section  1  that  factor  models  might  be  employed 
to  replace  Xi(x)  by  some  Xi(x)  through  regression,  and  that  evidence  suggests  selecting  for  this 
regression  the  error  measure  5,  in  the  same  quadrangle  as  the  risk  measure  1Z-L.  It  follows  that,  in 
an  optimization  problem  (1.2)  with  objective  and  constraints  of  superquantile/CVaR  type,  quantile 
regression  is  perhaps  most  appropriate,  at  least  in  some  linear  models,16  and  should  even  be  carried 
out  at  the  cq  threshold  chosen  for  each  z.17  Another  observation  is  that  quantile  regression  at  level 
at  turns  into  minimization  of  the  superquantile/CVaR  deviation  measure  D0i  (X)  =  qa.  (X  —  EX) 
for  X  =  Zf  in  (1.4).  This  is  laid  out  in  general  by  the  Regression  Theorem  of  Section  5.  Only  the 
quadrangle  scheme  is  capable  of  bringing  all  this  together. 

The  special  case  of  Example  2  in  which  the  quantile  is  the  median  is  worth  looking  at  directly. 
It  corresponds  to  the  error  measure  being  the  C 1  -norm  in  contrast  to  Example  1,  where  the  error 
measure  was  the  C2- norm.  This  furnishes  a  statistical  alternative  to  least-squares  regression  in  which 
the  mean  is  replaced  by  the  median,  which  may  in  some  situations  be  a  better  way  of  centering  a 
random  cost.18  Regression  comes  out  then  in  (1.5)  as  minimizing  the  mean  absolute  deviation  of  the 
error  random  variable  Zf  subject  to  it  having  0  as  its  median. 

Example  3:  A  Median-Based  Quadrangle  (the  quantile  case  for  a  =  \  ) 

5(A)  =  VaR1/2(A)  =  <?i/2p0  =  median 

TZ(X)  =  CVaR1/2(A)  =  ql/2(X)  =  “supermedian”  (average  in  median-tail) 

T>(X)  =  E\X  —  gi/2 (A) |  =  mean  absolute  deviation 
V(A)  =  2EX+  =  Cl -regret 
5(A)  =  E\X\  =  T1 -error 

For  the  sake  of  comparison,  it  is  instructive  to  ask  what  happens  if  the  error  measure  £  on  the 
estimation  side,  and  in  potential  application  to  generalized  regression,  is  the  £°°-norm.  This  leads  to 
our  fourth  example,  which  emphasizes  the  case  where  A  is  (essentially)  bounded. 

Example  4:  A  Range-Based  Quadrangle  (with  A  >  0  as  a  scaling  parameter) 

5(A)  =  ^[sup  A  +  inf  A]  =  center  of  range  of  A  (if  bounded) 

1Z( A)  =  EX  +  |  [sup  A  —  inf  A]  =  range-buffered  risk,  scaled 

V(  X)  =  |  [sup  A  —  inf  A]  =  radius  of  the  range  of  A  (maybe  oo),  scaled 

V(A)  =  EX  +  A  sup  |A|  =  C°° -regret,  scaled 

5(A)  =  Asup|A|  =  £°°-error,  scaled 

This  is  not  an  expectation  quadrangle.  Having  A  adequately  <  C  means  here  that  A  is  kept  below  C 
by  a  margin  equal  to  A  times  the  radius  of  the  range  of  A.  The  interpretation  of  regression  provided 
by  (1.5)  is  that  the  radius  of  the  range  of  Zf  is  minimized  subject  to  its  center  being  at  0. 

The  example  offered  next  identifies  both  as  the  statistic  and  as  the  risk  the  “worst  cost”  associated 
with  A.  It  can  be  regarded  as  the  limit  of  the  quantile-based  quadrangle  in  Example  2  as  a  — >  1. 

16This  is  a  fertile  topic  for  more  research. 

17 Again,  this  is  an  insight  applicable  not  only  to  Example  2,  but  to  any  of  the  other  quadrangles  that  will  come  up. 

18Furthermore,  this  quadrangle  and  the  other  quantile  quadrangles  in  Example  2  will  be  seen  to  exhibit  the  “coherency” 
that  was  lacking  in  Example  1,  and  for  that  matter,  Example  1'. 
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Example  5:  A  Worst-Case-Based  Quadrangle 

5(A)  =  sup  A  =  top  of  the  range  of  A  (maybe  oo) 

1Z(X)  =  sup  A  =  yes,  the  same  as  5(A) 

V(X)  =  sup  A  —  EX  =  span  of  the  upper  range  of  A  (maybe  oo) 


V{X)  =  { 


£(1)5^1 


0 

oo 


if  A  <  0 
if  A  %  0 

if  A  <  0 
.  oo  if  A  %  0 

This  is  another  expectation  quadrangle  but  with  functions  of  unusual  appearance: 


=  worst-case-regret 


=  worst-case-error 


e(x) 


— x  if  x  <  0 
oo  if  x  >  0 


v(x) 


0  if  x  <  0 

oo  if  x  >  0 


u(y) 


— oo  if  y  <  0 
0  if  y  >  0  ' 


The  “range”  of  A  here  is  its  essential  range,  i.e. ,  the  smallest  closed  interval  in  which  outcomes  must 
lie  with  probability  1.  Thus,  in  Example  5,  the  inequality  A(A)  <  C  gives  the  risk-measure  code  for 
insisting  that  A  <  C  with  probability  1.  The  regret  measure  V(A)  assigns  infinite  penalty  when  this 
is  violated,  but  no  disincentive  otherwise.  The  regression  associated  with  the  quadrangle  in  Example 
5  is  one-sided.  It  corresponds  in  (1.5)  to  minimizing  \EZf  \  subject  to  sup  Zf  =  0. 

A  major  attraction  of  the  risk  measure  in  Example  5  is  that,  on  the  surface  at  least,  it  apparently 
bypasses  having  to  think  about  probabilities.  This  is  the  central  theme  of  so-called  “robust  optimiza¬ 
tion.”19  However,  a  generalization  can  be  made  in  which  some  additional  probabilistic  insights  are 
available,  and  the  appraisal  of  “worst”  is  distributed  over  different  visions  of  the  future  tied  to  a  coarse 
level  of  probability  modeling.  The  details  will  not  be  fully  understandable  until  we  begin  posing  risk 
in  the  rigorous  framework  of  a  probability  space  in  Section  3  (and  all  the  more  in  Section  6),  but  we 
proceed  anyway  here  to  a  suggestive  preliminary  formulation.  It  depends  on  partitioning  the  underly¬ 
ing  uncertainty  about  the  future  into  several  different  “sets  of  circumstances”  k  =  1, . . .  ,r  having  no 
overlap20  and  letting 


Pk  =  probability  of  the  /cth  set  of  circumstances,  with  pk  >  0,  p\  +  •  •  •  +  pr  =  1, 

supfc  X  =  worst  of  A  under  circumstances  k,  for  k  =  1, . . . ,  r,  (2-5) 

EkX  =  conditional  expectation  of  A  under  circumstances  k. 


The  last  implies,  of  course,  that  p\E\X  +  •  •  •  +  prErX  =  EX. 

Example  6:  A  Distributed- Worst-Case-Based  Quadrangle  (with  respect  to  (2.5)) 
5(A)  =  pi  sup1  A  +  •  •  •  +  pr  sup,r  A 

TZ(X)  =  pi  sup-L  A  +  •  •  •  +  pr  supr  A  =  yes,  the  same  as  5(A) 

V(X)  =  pi  [sup-L  A  —  Ei  A]  H - +  pr[sup,r  A  —  Er  A] 

19In  that  subject,  probabilistic  assessments  typically  enter  nevertheless  through  construction  of  an  “uncertainty  set” 
consisting  of  the  future  states  or  scenarios  deemed  worthy  of  consideration  in  the  worst-case  analysis.  That  uncertainty 

set  can  be  identified  as  the  17  set  in  the  probability-space  underpinnings  of  risk  theory  explained  in  Section  3. 

20Technically  this  refers  to  “events”  as  measurable  subsets  17a,  of  the  probability  space  17  introduced  in  the  next  section. 
For  more,  see  also  (6.10)  and  what  preceeds  it.  In  the  context  of  “robust  optimization,”  one  can  think  of  the  chosen 
“uncertainty  set”  17  being  partitioned  into  a  number  of  smaller  uncertainty  sets  life  to  which  relative  probabilities  can  be 
assigned.  By  admitting  various  degrees  of  fineness  in  the  partitioning  (fields  of  sets  providing  “filters  of  information”), 
a  bridge  is  provided  between  different  layers  of  probability  knowledge. 
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V(A)  = 


0  if  Pi  sup1  A  +  •  •  •  +  pr  supr  X  <  0, 
oo  if  Pl  sup1  X  +  •  •  •  +  pr  supr  X  >  0 


_  f  E\pi  supx  X  +  •  •  •  +  Pr  supr  X\  if  Pl  sup ,  X  +  ■  ■  ■  +  pr  supr  X  <  0, 

V  '  \  OO  if  Pl  SUP!  X  +  •  •  •  +  pr  supr  X  >  0 

This  novel  example  is  again  not  an  expectation  quadrangle.  Moreover,  unlike  the  previous  cases,  the 
quantifiers  in  Example  6  are  not  “law-invariant,”  i.e.,  their  effects  on  X  depend  on  more  than  just  the 
distribution  function  Fx-  It  should  be  noted  that  expectations  only  enter  the  elements  on  the  right 
side  of  this  quadrangle.  As  far  as  optimization  is  concerned,  by  itself,  there  are  no  assumptions  about 
probability  structure  other  than  the  first  line  of  (2.5).  This  can  be  regarded  as  a  compromise  between 
the  starkness  of  Example  5  and  a  full-scale  probability  model. 

We  turn  now  to  an  example  motivated  especially  from  the  estimation  side.  It  concerns  an  expecta¬ 
tion  quadrangle  which  interpolates  between  Examples  1'  and  3  by  looking  at  an  error  expression  like 
the  one  in  Huber’s  modification  of  least-squares  regression  in  order  to  mollify  the  influence  of  outliers. 
We  introduce  a  scaling  parameter  (3  >  0  and  make  use  of  the  (3-truncation  function 


Tp{x) 


(3  when  x  >  (3, 
x  when  —  (3  <  x  <  (3, 
— / 3  when  x  <  —(3. 


Example  7:  A  Truncated-Mean-Based  Quadrangle  (with  scaling  parameter  (3  >  0) 


5(A)  =  np( X)  =  value  of  C  such  that  E[Tg(X  —  C)\  =  0 
'R-(X)  =  pp(X)  +  E[v( X  —  A))]  for  v  as  below 


D(A)  =  E[e( X  —  np(X))\  for  e  as  below 


V(A)  =  E[v( A)]  with  v(x) 


5(A)  =  E[e( A)]  with  e(x) 


2 

X  +  ^X 

2x-f 


FI  -  f 

2/3  x 


when  x  <  —  (3 
2  when  |x|  <  (3 

when  x  >  (3 

when  |a;|  >  (3 
when  |x|  <  (3 


Huber-type  error 


In  the  limit  of  pg(X)  as  (3  — >  oo,  we  end  up  with  just  EX,  as  in  Examples  1  and  V .  For  the 
deviation  measure  T>  in  Example  7,  one  can  think  of  2(3'D(X)  as  the  /3-truncation  cr| (A)  of  cr2(A). 
It  approaches  that  variance  as  (3  — >  oo.  In  the  corresponding  regression,  interpreted  through  (1.5), 
ap(Zf)  is  minimized  subject  to  pp(Zf)  =  0  for  the  error  random  variable  Z j.  This  contrasts  with 
minimizing  a2(Zf)  subject  to  n(Zf)  =  0  in  Examples  1  and  V . 

As  a  quadrangle,  Example  7  is  brand  new.  Its  noteworthy  feature,  as  contrasted  with  the  limiting 
case  in  Example  T,  is  that  its  v{x)  is  a  nondecreasing  convex  function  of  x?1  In  consequence,  V  and 
TZ  will  be  “monotonic”  (as  defined  in  Section  3)  and  their  dualizations  (in  Section  6)  will  fit  into  a 
framework  of  probability  which  the  dualizations  coming  out  of  Example  1'  cannot  attain. 

The  next  quadrangle,  again  in  the  expectation  case,  looks  very  different.  The  log-exponential  risk 
measure  at  the  heart  of  it  is  a  recognized  tool  in  risk  theory  in  finance,22  but  its  connection  with  a 
form  of  generalized  regression,  by  way  of  the  the  V-£  side  of  the  quadrangle,  has  not  previously  been 
contemplated.  As  in  Examples  5  and  6,  the  risk  A(A)  equals  the  statistic  5(A). 


21And  the  associated  utility  u(y)  will  be  a  nondecreasing  concave  function  of  y. 
22This  is  called  entropic  risk  in  Follmer  and  Scliied  [2004], 
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Example  8:  A  Log-Exponential-Based  Quadrangle 

5(A)  =  logE[expA]  =  expression  dual  to  Boltzmann-Shannon  entropy  23 

77(A)  =  logE[exp  A]  =  yes,  the  same  as  5(A) 

T>(X)  =  logE[exp(A  —  EX)]  =  log-exponential  deviation 

V(A)  =  E[expX  —  1]  =  exponential  regret  < — »  77(A)  =  E[  1  —  exp(— Y)\ 

5(A)  =  E[exp  A  —  A  —  1]  =  (unsymmetric)  exponential  error 

Regression  here  can  be  interpreted  by  (1.5)  as  minimizing  log  E[exp(Z f  —  E  Z f)\  =  logE[exp  Zf]  — EZf 
subject  to  log E[expZf]  =  0,  or  equivalently  minimizing  \EZf  \  subject  to  E[expZf]  =  1  (since  the 
latter  implies  exp EZf  <  1,  hence  EZf  <  0). 

The  regret  V  in  Example  8  is  paired  with  an  expected  utility  expression  that  is  commonly  employed 
in  finance:  we  are  in  the  expectation  case  with 

e(x)  =  exp.x  —  x  —  1,  v(x)  =  expx  —  1,  u(y)  =  1  —  exp(— y). 


Such  utility  pairing  is  seen  also  in  the  coming  Example  9,  which  fits  the  expectation  case  with 

e(x)  =  {  log  x  ifx<1  v(x)  =  {logEX  ifx<1  u(y)  =  f  lo§(l  +  y)  ify>“1 

[)  loo  if  x  >  1,  [)  loo  if  x  >  1,  HV)  l-oo  if  y  <  -1. 


Example  9:  A  Rate-Based  Quadrangle 

5(A)  =  r(A)  =  unique  C  >  sup  A  —  1  such  that  E  1_  y+c  =  1 

A(A)  =  r(A)  +  E[log  1_4r(X)' 

A)  =  r (A)  +  E  [  log  1„x^r(x)  -  A 
V(A)  =  E[logT^]  M  U(Y)  =  E[log(l  +  Y)] 

£(X)  =  E[logJ±x-X 

We  have  dubbed  this  quadrangle  “rate-based”  because,  in  the  utility  connection,  log(l  +  y)  is  an 
expression  applied  to  a  rate  of  gain  y  (which  of  necessity  is  >  —1);  cf.  Luenberger  [1998],  Chap.  15, 
for  the  role  of  this  in  finance.  Correspondingly  in  log  ,  we  are  dealing  with  a  rate  of  loss. 

The  next  two  examples  in  this  section  lie  again  outside  the  expectation  case  and  present  a  more 
complicated  picture  where  error  and  regret  are  defined  by  an  auxiliary  operation  of  minimization.  The 
first  concerns  “mixed”  quantiles/ VaR  and  superquantiles/CVaR.  The  idea,  from  the  risk  measure 
perspective,  is  to  study  expressions  of  the  type 

K(X)  =  f1  CXaR.a(X)d\(a)  (2.6) 

Jo 

for  any  weighting  measure  A  on  (0,1)  (nonnegative  with  total  measure  1).  In  particular,  if  A  is 
comprised  of  atoms  with  weights  A&  >  0  at  points  for  k  =  1, . . . ,  r,  with  Ai  +  •  •  •  +  Ar  =  1,  one  gets 

K(  X)  =  AiCVaRai(A)  +  •  •  •  +  ArCVaRQ:,,  (A).  (2.7) 

23The  “exp”  notation  is  adopted  so  as  not  to  conflict  with  the  convenient  use  of  “e”  for  error  integrands  in  (1.9). 
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The  question  is  whether  this  can  be  placed  in  a  full  quadrangle  in  the  format  of  Diagrams  1,  2  and  3. 

Incentive  comes  from  the  fact  that  such  risk  measures  have  a  representation  as  “spectral  measures” 
in  the  sense  of  Acerbi  [2002],  which  capture  preferences  in  terms  of  “risk  profiles.”24  We  proved  in 
[Rockafellar  et  al.,  2006a,  Proposition  5]  (echoing  our  working  paper  Rockafellar  et  al.  [2002])  that, 
as  long  as  the  weighting  measure  A  satisfies  /0'  ( 1  —  ot)~ld\(a)  <  oo,  the  risk  measure  in  (2.7)  can 
equivalently  be  expressed  in  the  form 

'JZ(X)  =  [  VaR T{X)<f>(r)dT  with  ffr)  =  I  (1  —  a)~1d\(a),  (2-8) 

Jo  4(0, r] 

where  the  function  0,  defined  on  (0, 1),  gives  the  risk  profile.25 

The  risk  profile  for  a  single  “unmixed”  risk  measure  CVaRo,  is  the  function  that  has  the  value 
1/(1  —  a)  on  [a, oo)  but  0  on  (0, a);  this  corresponds  to  formula  (2.4).  Moreover  the  risk  profile  for  a 
weighted  CVaR  sum  as  in  (2.7)  would  be  the  step  function  f  =  AiLi  +  •  •  •  +  Xr<far. 

Although  the  quadrangle  that  would  serve  for  a  general  weighting  measure  in  (2.6)  is  still  a  topic 
of  research,  the  special  case  in  (2.7)  is  accessible  from  the  platform  of  Rockafellar  et  al.  [2008],  which 
will  be  widened  in  Section  4  (in  the  Mixing  Theorem). 

Example  10:  A  Mixed-Quantile-Based  Quadrangle 

(for  any  confidence  levels  cq  6  (0, 1)  and  weights  Xk  >  0,  EL  l  A k  =  1) 

SPO  =  EL i  A kqak(X )  =  ELi  AfcVaRafc (X)  =  a  mixed  quantile26 
TZ(X)  =  ELi  A kQak(X)  =  ELi  AfcCVaRafe  (X)  =  a  mixed  superquantile 

V(X)  =  ELi  A kqak(X  -  EX)  =  ELi  XkCVaRak(X  -  EX) 

=  the  corresponding  mixture  of  superquantile  deviations 

VPQ  =  fiminj  ELi  AfcVafc(X  -  Bk)  |  ELi  A  kBk  =  0  } 

=  a  derived  balance  of  the  regrets  Vak(X)  =  1_1ctfc  EX+ 

£{X)  =  Bmm  {  ELi  *k£*k(X  -  Bk)  |  ELi  A kBk  =  0  } 

=  a  derived  balance  of  the  errors  £a  (X)  =  E [,L  X+  +  X_  ] 

The  case  of  a  general  weighting  measure  may  be  approximated  this  way  arbitrarily  closely,  as  can 
very  well  be  seen  through  the  corresponding  risk  profiles.  When  the  measure  is  concentrated  in  finitely 
many  points,  the  corresponding  profile  function  f  in  (2.9)  is  a  step  function,  and  vice  versa,  as  already 
noted.  An  arbitrary  profile  function  i f>  (fulfilling  the  conditions  indicated  above  in  a  footnote)  can  be 
approximated  by  a  profile  function  that  is  a  step  function. 

A  highly  interesting  use  for  the  quadrangle  of  Example  10  is  the  mixed  quantile  approximation  of 
a  superquantile.  According  to  (2.4),  the  value  qa(X)  =  CVaRo/X)  can  be  obtained  by  calculating 
the  integral  of  qT{X)  =  VaRT(A)  over  r  £  [a,l].  Classical  numerical  approaches  introduce  a  finite 

24Such  profiles  occur  in  “dual  utility  theory,”  a  subject  addressed  by  Yaari  [1987]  and  Roell  [1987]  and  recently  revisited 
with  greater  rigor  by  Dentcheva  and  Ruszczynski  [2013].  Their  integrals  are  the  “concave  distortion”  functions  seen  in 
finance  and  insurance  theory,  cf.  Folhner  and  Schied  [2004],  Pflug  [2009]. 

25This  function  4>  is  right-continuous  and  nondecreasing  with  </>(0+)  =  0,  </>(l~)  <  oo  and  J()  0(r)dr  ==  1.  Conversely, 
any  function  (p  with  those  properties  arises  from  a  unique  choice  of  A  as  described.  The  cited  sources  have  a  reversed 
formula  due  to  A'  being  gain-oriented  instead  of  loss-oriented,  as  here. 

26This  kind  of  sum,  in  which  some  of  the  terms  could  be  intervals,  is  to  be  interpreted  in  general  as  referring  to  all 
results  obtained  by  selecting  particular  values  within  those  intervals. 
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subdivion  of  the  interval  [a,  1]  and  replace  the  integrand  by  a  nearby  step  function  or  piecewise  linear 
function  based  on  the  quantiles  marking  that  subdivision.  It  is  easy  to  see  that  the  value  of  the 
integral  for  that  approximated  integrand  is  actually  a  mixed  quantile  expression.  The  conclusion  is 
that  versions  of  the  quadrangle  of  Example  10  can  serve  as  approximations  to  a  superquantile-based 
quadrangle  parallel  to  the  quantile-based  quadrangle  of  Example  2.2'  In  this  manner,  superquantile 
regression ,  in  which  the  statistic  is  a  superquantile  instead  of  a  quantile,  can  be  carried  out. 

Although  the  mixed  superquantile/CVaR  risk  measures  7Z  in  Example  10  have  a  well  recognized 
importance  in  expressing  preferences  toward  risk,  through  the  profiles  explained  above,28  the  iden¬ 
tification  in  this  quadrangle  of  a  corresponding  “optimally  mixed”  regret  measure  V  for  such  1Z  is 
new.  The  associated  error  measure  £  is  the  one  thereby  indicated  for  use  in  regression  approximations 
where  this  kind  of  risk  measure  is  involved. 

It  is  worth  emphasizing  that  the  min  expressions  for  V  and  £  in  Example  10  are  no  impediment 
at  all  in  practice  when  applied  to  optimization  or  regression.  For  instance,  the  trick  explained  after 
Example  2  for  simplifying  a  superquantile/CVaR  constraint  through  the  introduction  of  an  addi¬ 
tional  decision  variable  works  here  as  well.  The  only  difference  is  that  still  more  decision  variables 
corresponding  to  the  B's  in  the  quadrangle  are  introduced,  too. 

The  following  example  likewise  offers  something  new  as  far  as  risk  measures  and  potential  applica¬ 
tions  in  regression  are  concerned,  although  the  “statistic”  in  question  has  already  come  up  in  mortgage 
pipeline  hedging;  see  AORDA  [2010]. 

Example  11:  A  Quantile- Radius-Based  Quadrangle  (for  any  a  G  (1/2, 1)  and  A  >  0) 

5(A)  =  \[qa{ X)  -  gi-a(A)]  =  i[VaRa(A)  -  VaRi_Q(A)] 

=  the  a-quantile  radius  of  A,  or  ^-two-tail-VaRc  of  A 

K(  A)  =  EX  +  f  [qa(X)  +  qa{-X)\  =  EX  +  §[CVaRa(A)  +  CVaRa(-A)] 

=  reverted  CVaRa,  scaled 

V(X)  =  |[ga(A)  +  qa(- A)]  =  ^[CVaRQ(A)  +  CVaRa(-A)] 

=  the  a-superquantile  radius  of  A,  scaled 

V(A)  =  EX  +  nun  j  ^e[[B  +  A]+  +  [B  —  A]+]  -  B  } 

=  a-quantile-radius  regret  in  A,  scaled 

£(*)  =  2(1^0)  mjn e[[B  +  A]+  +  [B  -  A]+ 

=  a-quantile-radius  error  in  A,  scaled 

This  example  will  be  justified  and  extended  in  Section  3  (through  the  Reverting  Theorem). 

As  the  final  example  in  this  section,  we  offer  a  generalization  of  Example  2  to  the  “higher-order 
moment  risk  measures”  introduced  in  Krokhmal  [2007]  and  further  analyzed  recently  in  Dentcheva, 
Penev  and  Ruszczyhski  [2013].  The  “quantile”  terminology  does  not  come  from  those  works  and  is 
only  imposed  here  in  suggestion  of  the  strong  parallels  with  the  earlier  quantile-based  quadrangle, 
which  would  be  the  case  where  p  =  1. 

Example  12:  A  Higher-Order  Quantile-Based  Quadrangle  (for  a  G  (0, 1),  p  G  (l,oo)) 

5(A)  =  qa\ A)  =  p-moment  quantile 

1Z(X)  =  qa\ A)  =  p- moment  superquantile 

27In  Rockafellar  and  Royset  [2013],  direct  expressions  for  the  elements  of  this  quadrangle  are  given. 

28See  also  the  theoretical  developments  in  [Follmer  and  Schied,  2004,  Chapter  4.5]. 
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T>(X)  =  q{S\x  —  EX)  =  p-moment  superquantile-deviation 
V(X)  =  j^WX+Wp  =  p-normed  absolute  loss,  scaled 
5(A)  =  ||X+||p  —  EX  =  p- moment  quantile  error 

The  p-moment  quantile  q^\x)  is  known  to  be  characterized  by  the  equation 

(1  -  a)”"1  =  ||(X  -  ,i»>(X))+||p_,/||(X  -  g(f>m)+||,,. 

For  this  and  other  properties,  see  Krokhmal  [2007]. 

What  considerations  have  to  be  faced  in  constructing  further  quadrangle  examples?  For  instance, 
is  there  a  full  quadrangle  with  77(A)  =  EX,  or  with  77(A)  =  VaR^A)  =  qa(X)7  The  answer  is  yes 
in  both  cases,  provided  that  VaRQ(A)  and  qQ(X)  (which  can  be  intervals  in  our  setting)  are  replaced 
by  VaR“(A)  and  q~(X),  say,  but  the  resulting  quadrangles  are  “not  interesting.”  For  77(A)  =  EX, 
we  must  have  D(A)  =  0  in  accordance  with  Diagram  3.  An  associated  measure  of  error  would  be 
5(A)  =  |£A|,  which  is  paired  with  V(X)  =  EX  +  \EX\  =  2max{0,£A}.  Then  5(A)  =  EX  and 
offers  us  nothing  new. 

For  77(A)  =  VaR“(A),  on  the  other  hand,  we  have  'D(X)  =  VaR“(A  —  EX)  and  could  take 
5(A)  =  VaR“(A  —  EX)  +  \EX\  and  correspondingly  V(A)  =  VaR”  (A)  +  2 max{0,  EX}.  However, 
then  we  merely  have  5(A)  =  EX.  Some  different  and  more  interesting  V(A)  might  project  onto 
77(A)  =  VaR“(A)  through  the  formula  in  Diagram  3,  but  this  remains  to  be  seen.29 

In  a  similar  vein,  it  might  be  wondered  whether  the  expression  VaRo,(A)  —  VaRi_Q(A)  appearing 
as  the  statistic  of  Example  10  could  serve  as  the  deviation  measure  27(A)  in  some  quadrangle,  since 
it  is  nonnegative  and  vanishes  for  constant  A.  Again  the  answer  is  yes,  but  perhaps  only  trivially. 

Anyway,  the  most  important  guideline  for  additional  quadrangle  examples  is  that  the  quantifiers 
must  fit  with  the  descriptions  in  Diagram  2,  which  have  yet  to  be  fleshed  out  with  appeals  to  specific 
mathematical  properties.  That  is  our  task  in  the  coming  section.  Those  properties  have  to  make 
sense  in  applications  and  lead  to  a  sturdy  methodology,  and  the  real  trouble  with  77(A)  =  EX 
and  77(A)  =  VaR“(A)  as  measures  of  risk  is  that  they  fall  short  of  meeting  such  a  standard.  The 
Quadrangle  Theorem  of  the  coming  Section  3,  our  central  result,  will  therefore  not  apply  to  them. 

3  The  Main  Properties  and  Relationships 

This  section  is  devoted  to  laying  a  rigorous  foundation  for  the  elements  of  the  risk  quadrangle  and 
their  interconnections.  It  also  furnishes  tools  for  generating  additional  quadrangles  from  given  ones. 

In  working  with  random  variables  we  adopt  the  standard  model  in  probability  theory,  which 
interprets  them  as  functions  on  a  probability  space.  Specifically,  we  suppose  there  is  an  underlying 
space  P  with  elements  uj  standing  for  future  states,  or  scenarios,  along  with  a  measure  which  assigns 
probabilities  to  various  subsets  of  P.  There  is  no  loss  of  generality  in  this,  but  technicalities  come  in 
which  we  wish  to  avoid  getting  too  occupied  with  at  present.30  Random  variables  from  now  on  are 

29No  claim  is  made  about  there  being  a  unique  £  projecting  onto  some  T>,  or  a  unique  V  projecting  onto  some  77,  and 
indeed  that  must  not  be  hoped  for.  The  real  issue  instead  is  that  of  determining  an  “natural”  antecedent  with  valuable 
characteristics.  For  instance,  any  risk  measure  77  can  be  projected  from  V(X)  =  77(A)  +  A|2AA|  and  any  deviation 
measure  from  £(X)  =  7?(A')  +  A|77Aj  for  arbitrary  A  >  0,  with  the  pointless  consequence  that  5(A)  =  EX. 

30More  explanation  is  provided  in  Section  6,  which  also  offers  motivation  and  examples  for  readers  who  might  not  be 
so  familiar  with  this  way  of  thinking. 
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functions  X  :  H  -A  1R,  but  we  restrict  attention  to  those  for  which  XIX2]  <  oo,  indicating  this  by 
X  G  £2(H).  Here  E  is  the  expectation  with  respect  to  the  background  probability  measure  on  H.31 

Any  X  G  C2{0.)  also  has  E\X \  <  oo,  so  that  EX  is  well  defined  and  finite.  Furthermore,  the 
variance  cr2(X)  =  E[X  —  EX]2  and  its  square  root,  the  standard  deviation  <r(X),  are  well  defined  and 
finite.32  These  expressions  characterize  the  natural  (“strong”)  convergence  in  £2(H)  of  a  sequence  of 
random  variables  Xk  to  a  random  variable  X: 


C2-  lim  Xk  =  X 


lim  \\Xk  -X||2  =  0 
lim  E[Xk  -  X]  =  0  and 

k — S'-oo 


lim  a(Xk  -X)  =  0. 

k— >oo 


(3.1) 


In  many  applications  H  may  consist  of  finitely  many  elements  to,  each  having  a  positive  probability 
weight.  The  choice  of  norm  makes  no  difference  then,  because  £2(H)  is  finite-dimensional.33 

The  quantifiers  1Z ,  T>,  V  and  £,  all  of  which  assign  numerical  values,  possibly  including  +oo,34  to 
random  variables  X,  are  said  to  be  “functionals”  on  £2(H).  Some  of  the  properties  that  come  up  may 
be  shared,  so  it  is  expedient  to  state  them  in  terms  of  a  general  functional  T  :  £2(H)  — >  (—00,00]: 

•  T  is  convex  if  X((l  —  t)X  +  tX')  <  (1  —  t)E(X)  +  tF{X')  for  all  X,  X',  and  r  G  (0,  l).35 

•  T  is  positively  homogeneous  if  X(0)  =  0  and  X(AX)  =  AX(X)  for  all  A  G  (0, 00). 

•  T  is  subadditive  if  X(X  +  X')  <  E(X)  +  X(X')  for  all  X,  X'. 

•  T  is  monotonic  (nondecreasing,  here)  if  X(X)  <  J-{X')  when  X  <  X'.36 

•  T  is  closed  if,  for  all  C  G  1R,  the  set  {  X  |  X(X)  <  C  }  is  closed.37 

Convexity  will  be  valuable  for  much  of  what  we  undertake.  Positive  homogeneity  is  a  more  special 
property  which,  in  the  study  of  risk,  was  emphasized  more  in  the  past  than  now.  An  elementary  fact 
of  convex  analysis  is  that 


T  convex  +  positively  homogeneous 


T  subadditive  +  positively  homogeneous. 


(3.2) 


The  combinations  in  (3.2)  are  equivalent  to  sublinearity.  AfcXfc)  <  J2k  AfeX(Xfc)  for  >  0.38 

Other  important  consequences  of  convexity  emerge  only  in  combination  with  closedness.  One  that 
will  be  applied  in  several  ways  is  the  following  rule  coming  out  of  convex  analysis.39 


If  T  is  closed  convex,  and  if  Xo,  Y,  c,  make  the  function  f(t)  =  Jr(X 0  +  tY)  —  tc  be 
bounded  above  for  t  G  [0, 00),  then  X(X  +  tY)  —  tc  <  E(X)  for  all  X  and  t  G  [0, 00). 


(3.3) 


31Tlie  inner  product  between  two  elements  X  and  Y  of  £2(P)  is  (X,Y)  =  E[XY]. 

32It  might  be  wondered  why  we  insist  on  boundedness  of  second  moments  when  requiring  only  E\X\  <  00  would  cover 
a  larger  class  of  random  variables.  The  main  reason  is  that  this  leads  to  a  simpler  exposition  in  Section  6,  when  we  come 
to  the  dualization  of  risk  in  terms  of  sets  of  probability  densities  Q  (having  Q  >  0,  EQ  =  1).  With  the  finiteness  of 
E\X\  as  the  only  requirement  we  would  be  limited  there  to  bounded  densities  Q.  It  would  be  better  really  if  we  could 
draw  on  all  possible  densities  Q,  but  that  would  force  us  to  go  to  the  opposite  extreme  of  requiring  X  to  be  essentially 
bounded.  The  choice  made  here  is  a  workable  compromise. 

33In  finite  dimensions,  all  norms  give  the  same  convergence. 

34This  feature  helps  to  make  our  choice  of  T2(P)  as  the  underlying  space  much  less  restrictive  than  might  be  imagined. 

3  Tn  expressions  like  this,  a  sum  of  values  in  (—00,  00]  is  00  if  any  of  them  is  00.  Also,  Aoo  =  00  for  A  >  0. 

36This  inequality  is  to  be  interpreted  in  the  “almost  sure”  sense,  meaning  that  the  set  of  we!!  for  which  X(w)  <  X'(u>) 
has  probability  1. 

3 'This  property  is  also  called  lower  semicontinuity.  A  subset  of  T2(P)  is  closed  when  it  contains  all  limits  of  its 
sequences  in  the  sense  of  (3.1).  For  convex  sets,  weak  limits  give  the  same  closedness  as  those  strong  limits. 

38Under  the  convention,  if  necessary,  that  Ooo  =  0. 

39  Apply  Theorem  8.6  of  Rockafellar  [1970]  to  the  function  f(s,t)  =  J-(sX  +  tY). 
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An  immediate  consequence,  for  instance,  is  that40 

for  T  closed  convex:  if  E(X)  <  0  whenever  X  <  0,  then  T  is  monotonic.  (3-4) 

To  assist  with  closedness,  it  may  help  to  note  that  this  property  of  T  holds  when  E  is  continuous,41  and 
moreover,  as  long  as  T  does  not  take  on  oo,  that  stronger  property  is  automatic  in  broad  circumstances 
of  interest  to  us.  Namely,42 

!X  is  finite,  convex,  and  closed,  or 

T  is  finite,  convex,  and  monotonic,  or  (3.5) 

T  is  finite,  convex,  and  is  finite. 

Closedness  can  also  approached  through  so-called  “weak”  convergence  in  place  of  the  “strong”  con¬ 
vergence  described  by  (3.1),  since  the  closedness  of  convex  sets  is  known  to  be  the  same  either  way. 
Weak  convergence  of  Xk  to  X  means  that  E[XkQ }  — >  E[XQ\  for  all  Q  €  C2.  In  fact  it  suffices  in  this 
to  restrict  attention  to  Q  >  0  with  EQ  =  1,  inasmuch  as  linear  combinations  of  such  Q  fill  up  all  of 
C? .  That  will  be  especially  meanigful  in  Section  6,  where  Q  of  this  type  will  be  interpreted  as  the 
density  with  respect  to  Pq  of  an  alternative  probability  measure  P. 

Measures  of  risk.  The  role  of  a  measure  of  risk,  7 Z,  is  to  assign  to  a  random  variable  X ,  standing 
for  an  uncertain  “cost”  or  “loss,”  a  numerical  value  7Z(X)  that  can  serve  as  a  surrogate  for  overall 
(net)  cost  or  loss.  However,  the  assignment  must  meet  reasonable  standards  in  order  to  make  sense. 

The  class  of  coherent  measures  of  risk  has  attracted  wide  attention  in  finance  in  this  regard.  A 
functional  7 Z  belongs  to  this  class,  as  introduced  in  Artzner  et  al.  [1999],  if  it  is  convex  and  positively 
homogeneous  (or  equivalently  by  (3.2)  subadditive  and  positively  homogeneous),  as  well  as  monotonic, 
and,  in  addition,  satisfies43 

7 Z(X  +  C)  =  7 Z(X)  +  C  for  all  X  and  constants  C .  (3.6) 

Closedness  of  1Z  was  not  mentioned  in  Artzner  et  al.  [1999],  but  the  context  there  supposed  1Z  to 
be  finite  (and  actually  H  finite,  too),  so  that  closedness  and  even  continuity  of  1Z  were  implied  by 
coherency  through  (3. 5). 44  Subsequent  researchers  considered  dropping  the  positive  homogeneity,  and 
with  it  the  term  “coherent,”  speaking  then  of  a  “convex  measure  of  risk”  or  a  “convex  risk  function,” 
cf.  Follmer  and  Schied  [2004],  Ruszczyhski  and  Shapiro  [2006a].45  However,  without  denying  the 
importance  of  these  ideas,  we  will  organize  assumptions  and  terminology  a  bit  differently  here.  The 
crucial  role  that  EX  has  in  the  fundamental  risk  quadrangle  is  our  guide,  along  with  the  importance 
of  “closedness”  in  dealing  with  functionals  that  might  take  on  oo.46 

40Consider  the  case  of  (3.3)  with  Xq  =  0,  Y  <  0,  and  c  =  0. 

41Continuity  of  T  means  that  T(Xk)  — >  E(X)  whenever  Xk  — >  X  as  in  (3.1). 

42For  the  first:  [Rockafellar,  1974,  Corollary  8B].  For  the  second:  [Ruszczyhski  and  Shapiro,  2006a,  Proposition  3.1]. 
For  the  third:  [Rockafellar,  1970,  Theorem  10.1],  recalling  that  £2(0)  is  finite-dimensional  when  Q  is  finite. 

44 A  slightly  different,  but  ultimately  equivalent  property  was  originally  formulated  in  Artzner  et  al.  [1999].  Note  that 
positive  homogeneity  enables  the  units  of  measurement  of  A'  to  be  the  same  as  those  of  7Z(X). 

44Coherency  was  extended  to  general  in  Delbaen  [2002]  with  X  restricted  to  T°°(fl)  and  1Z  still  finite- valued,  in 
which  case  7Z  is  likewise  continuous  by  [Ruszczyhski  and  Shapiro,  2006a,  Proposition  3.1].  That  framework  was  also 
maintained  in  Follmer  and  Schied  [2004]. 

45In  our  view,  the  idea  behind  “coherency”  is  oriented  to  monotonicity  plus  convexity.  In  Rockafellar  [2007],  risk 
measures  satisfying  the  axioms  of  coherency  except  for  this  positive  homogeneity  called  coherent  in  the  extended  sense. 

46 Another  reason  is  that  the  “convex  risk  measure”  terminology  insists  on  monotonicity,  but  we  want  a  framework 
that,  for  the  sake  of  broad  understanding,  encompasses  some  risk  measures  without  monotonicity,  such  as  TZ(X)  = 
n(X)  +  Xa{X). 
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By  a  regular  measure  of  risk  we  will  mean  a  functional  77  with  values  in  (—00, 00]  that  is  closed 
convex  with 

7 7((7)  =  C  for  constants  C  (3-7) 

and  furthermore 

77(X)  >  EX  for  nonconstant  X.  (3-8) 

Property  (3.8)  is  aversity  to  risk.4'  Observe  that  (3.7)  implies  the  seemingly  stronger  property  (3.6) 
of  Artzner  et  al.  [1999]  by  the  rule  in  (3.3)48  and  therefore  entails 

77(X  -  EX)  =  77(X)  -  EX  for  all  X  (3.9) 

in  particular.  An  advantage  of  stipulating  (3.7)  in  place  of  (3.6)  lies  in  motivation.  The  surrogate  cost 
value  that  a  measure  of  risk  should  assign  to  a  random  variable  that  always  comes  out  with  the  value 
C  ought  to  be  C  itself. 

In  all  of  the  Examples  1-12  above,  77.  is  a  regular  measure  of  risk,  and  in  Examples  1-3,  5-6,  10-12, 
77  is  also  positively  homogeneous.  In  Examples  2-3,  5-10  and  12,  77  is  monotonic,  but  in  Examples 
1,  4  and  11  it  is  not.  Only  the  risk  measures  in  Examples  2-3,  5-6,  10  and  12  are  coherent  in  the 
sense  of  Artzner  et  al.  [1999].  For  77  =  qa  =  CVaRa  in  Example  2,  this  was  perceived  from  several 
angles  that  eventually  came  together;  see  Pflug  [2000],  Acerbi  and  Tasche  [2002]  and  Rockafellar  and 
Uryasev  [2002],  For  77  =  q in  Example  12,  the  coherency  was  established  by  Krokhmal  [2007]. 

An  example  of  a  coherent  measure  of  risk  that  is  not  regular  is  77(X)  =  EX,  which  lacks  aversity. 
On  the  other  hand,  77(X)  =  VaR“(X)  fails  to  be  a  regular  measure  of  risk  by  lacking  closedness, 
convexity  and  the  aversity  in  (3.8),  in  general,  although  it  does  have  positive  homogeneity,  satisfies 
(3.6)  and  is  monotonic.  It  fails  to  be  a  coherent  measure  of  risk  through  the  absence  of  convexity. 

Measures  of  deviation.  The  role  of  a  measure  of  deviation,  V,  is  to  quantify  the  nonconstancy 
(as  the  uncertainty)  in  a  random  variable  X.49  By  a  regular  measure  of  deviation  we  will  mean  a 
functional  V  with  values  in  [0,  00]  that  is  closed  convex  with 

V(C)  =  0  for  constants  C,  but  27(X)  >  0  for  nonconstant  X.  (3.10) 

The  measures  of  deviation  in  Examples  1-12  all  fit  this  prescription.  Note  that  symmetry  is  not 
required:  perhaps  V{—X)  /  27(X). 

Measures  of  error.  The  role  of  a  measure  of  error,  £,  is  to  quantify  the  nonzeroness  in  a  random 
variable  X.50  By  a  regular  measure  of  error  we  will  mean  a  functional  £  with  values  in  [0, 00]  that  is 
closed  convex  with 

£(0)  =  0  but  £(X)  >  0  when  X  ^  0  (3.11) 

and  satisfies  for  sequences  of  random  variables  {X^})^  the  condition  that 

lim  £(Xk)  =  0  lim  EXk  =  0.  (3.12) 

k— >00  k— >00 

4 'Risk  measures  satisfying  this  condition  were  introduced  as  averse  measures  of  risk  in  Rockafellar  et  al.  [2006a].  A 
constant  random  variable  X  =  C  has  77(X)  =  EX  by  (3.7). 

48In  (3.3)  with  T  =  77  and  A'o  =  0,  first  take  Y  =  C  and  c  =  C  for  any  C.  Since  (3.7)  gives  77(0  +  tC )  —  tC  =  0,  it 
follows  that  1Z(X  +  C)  —  C  <  77(A)  for  all  X  and  C.  Applying  this  next  to  X  +  C  and  —  C  in  place  of  A'  and  C,  get 
77(X)  +  C  <  77(X  +  C),  hence  an  equation. 

49  Deviation  measures  as  a  special  class  of  functionals  were  introduced  in  Rockafellar  et  al.  [2002]  with  follow-up  in 
Rockafellar  et  al.  [2006a], 

50  Measures  of  error  in  such  general  terms  were  introduced  in  Rockafellar  et  al.  [2008]. 
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The  latter  requirement,  meaning  that  random  variables  X  with  \EX\  bounded  away  from  0  cannot  be 
arbitrarily  close  to  0  as  measured  by  £(X),  will  enter  into  the  projection  from  £  to  V  that  is  featured 
on  the  right  side  of  the  quadrangle.  It  is  equivalent  actually  to  the  seemingly  stronger  property  that 
£(X)  >  ip(EX)  for  a  convex  function  ip  on  (—00,00)  having  ip( 0)  =  0  but  ip(t)  >  0  for  t  7^  0.51  In 
common  situations  it  holds  automatically,  as  for  instance  when  II  is  finite,52  or  in  the  expectation  case 
with  £(X)  =  E[e(X)\  for  a  convex  function  e  on  (—00,00)  having  e(0)  =  0  but  e(x)  >  0  for  x  /  0.53 
In  Examples  1—12  every  measure  of  error  is  regular,  but  some  cases  can  have  £{—X)  7^  £(X). 

Measures  of  regret  and  relative  utility.  The  role  of  a  measure  of  regret,  V,  is  to  quantify 
the  displeasure  associated  with  the  mixture  of  potential  positive,  zero  and  negative  outcomes  of  a 
random  variable  X  that  stands  for  an  uncertain  cost  or  loss.  Regret  in  this  sense  is  close  to  the  notion 
of  an  overall  penalty,  but  it  might  sometimes  come  out  negative  and  therefore  act  as  a  reward.  As 
mentioned  in  the  introduction,  regret  is  the  flip  side  of  relative  utility.  Measures  of  regret  V  correspond 
to  measures  of  relative  utility  U  through 

V(X)  =  -U(-X),  U  (Y)  =  -V(-y),  (3.13) 

where  Y  denotes  a  random  variable  oriented  toward  uncertain  gain  instead  of  loss.  Everything  said 
about  regret  could  be  conveyed  instead  in  the  language  of  utility,  but  that  would  trigger  switches  of 
orientation  between  loss  and  gain  together  with  tedious  minus  signs  coming  from  (3.13). 

By  a  regular  measure  of  regret  we  will  mean  a  functional  V  with  values  in  (—00, 00]  that  is  closed 
convex ,  has  the  aversity  property  that 

V(0)  =  0  but  V(X)  >  EX  when  X  ^  0,  (3.14) 

and  satisfies  for  sequences  of  random  variables  {X^}?^  the  condition  that 

lim  [V(Xfc)  -  EXk]  =0  =►  lim  EXk  =  0.  (3.15) 

k— >00  k — >-oo 

The  limit  condition  parallels  the  one  in  (3.12)  and  likewise  is  automatic  when  fl  is  finite,  or  in  the 
expectation  case  where  V(X)  =  E[v(X)\  for  a  convex  function  v  on  (—00,00)  having  u(0)  =  0  but 
v(x)  >  x  for  All  the  measures  of  regret  in  Examples  1-12  are  regular. 

As  with  measures  of  risk  7Z,  there  is  strong  incentive  for  asking  V  also  to  be  monotonic.  That 
additional  property  holds  for  the  measures  of  regret  in  Examples  2-3,  5-10  and  12,  but  not  in  Examples 
1,  4  and  ll.54 

By  a  regular  measure  of  relative  utility  we  will  mean  a  functional  U  having  the  “flipped”  properties 
that  correspond  to  those  of  a  regular  measure  of  regret  V  through  (3. 13). 55 

Quadrangle  Theorem. 

(a)  The  relations  T>(X)  =  1Z(X)  —  EX  and  1Z(X)  =  EX  +  P(X)  give  a  one-to-one  correspondence 
between  regular  measures  of  risk  7Z  and  regular  measures  of  deviation  V.  In  this  correspondence,  7Z 
is  positively  homogeneous  if  and  only  ifV  is  positively  homogeneous.  On  the  other  hand, 

TZ  is  monotonic  if  and  only  if  V(X)  <  supX  —  EX  for  all  X.  (3.16) 

slFrom  (3.12)  the  function  ip(t)  =  inf  {  £ (X)  |  EX  =  t  }  has  these  properties. 

52The  finite-dimensionality  of  £2(S7)  and  the  closed  convexity  £  in  combination  with  (3.11)  ensure  then  that  the  lower 
level  sets  of  £  are  compact. 

53Then  E[e(X)\  >  e(EX)  by  Jensen’s  Inequality. 

54There  is  potential  motivation  sometimes  for  working  without  such  monotonicity,  as  will  be  explained  in  Section  5. 

55More  details  on  this  will  be  provided  in  Section  4. 
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(b)  The  relations  £(X)  =  V(X)  —  EX  and  V(X)  =  EX  +  £ (X)  give  a  one-to-one  correspondence 
between  regular  measures  of  regret  V  and  regular  measures  of  error  £.  In  this  correspondence,  V  is 
positively  homogeneous  if  and  only  if  £  is  positively  homogeneous.  On  the  other  hand, 

V  is  monotonic  if  and  only  if  £{X)  <  \EX\  for  X  <  0.  (3.17) 

(c)  For  any  regular  measure  of  regret  V,  a  regular  measure  of  risk  IZ  is  obtained  by 

IZ(X)  =  rmn{c  +  V(X-C')}.  (3.18) 

If  V  is  positively  homogeneous,  IZ  is  positively  homogeneous.  If  V  is  monotonic,  IZ  is  monotonic. 

(d)  For  any  regular  measure  of  error  £,  a  regular  measure  of  deviation  V  is  obtained  by 

V(X)  =  nun  j  £{X  -  C)  }.  (3.19) 

If  £  is  positively  homogeneous,  V  is  positively  homogeneous.  If  £  satisfies  the  condition  in  (3.17), 
then  V  satisfies  the  condition  in  (3.16). 

(e)  In  both  (c)  and  (d),  as  long  as  the  expression  being  minimized  is  finite  for  some  C,  the  set  of  C 
values  for  which  the  minimum  is  attained  is  a  nonempty,  closed,  bounded  interval.56  Moreover  when 
V  and  £  are  paired  as  in  (b),  the  interval  comes  out  the  same  and  gives  the  associated  statistic: 

argmin{  C  +  V(X  —  C)  }  =  S(X)  =  argmin{  £ (X  —  C)  },  with  S(X  +  C)  =  S(X)  +  C.  (3.20) 
C  C 

This  theorem  integrates,  in  a  new  and  revealing  way,  various  results  or  partial  results  that  were 
separately  developed  elsewhere,  and  in  many  instances  only  for  positively  homogeneous  quantifiers. 
The  correspondence  between  IZ  and  V  in  part  (a)  was  officially  presented  in  Rockafellar  et  al.  [2006a] 
after  being  laid  out  much  earlier  in  the  unpublished  report  Rockafellar  et  al.  [2002], 5 '  The  results 
in  parts  (d)  and  (e)  about  projecting  from  £  to  T>  come  from  Rockafellar  et  al.  [2008],  where  they 
were  employed  in  generalized  linear  regression.58  The  observation  in  part  (b)  immediately  translates 
them  to  the  results  in  parts  (c)  and  (e)  about  projecting  from  V  to  IZ.  However,  a  general  version  of 
(c)  in  the  positively  homogeneous  case  was  separately  developed  earlier,  without  that  connection,  by 
Krokhmal  [2007]. 

Although  the  parallel  between  £  — >  V  and  V  IZ,  which  ties  the  two  sides  of  the  quadrangle 
fully  together,  is  mathematically  elementary,  it  has  not  come  into  focus  easily  despite  its  conceptual 

56Typic.ally  this  interval  reduces  to  a  single  point. 

57 As  in  those  works,  even  though  they  only  looked  at  the  positively  homogeneous  case,  the  justification  of  (3.17)  follows 
by  applying  (3.4)  to  T  =  IZ.  The  justification  of  (3.13)  works  the  same  way  with  J-  =  V  in  (3.4). 

58The  only  real  effort  in  the  proof  of  the  projection  claims  is  in  showing  that,  when  T>  comes  from  (3.19),  the  minimum 
over  C  is  attained  and  V  inherits  the  closedness  of  £.  This  draws  on  (3.12).  The  argument  in  Rockafellar  et  al.  [2008] 
utilized  positive  homogeneity,  but  it  is  readily  generalized  as  follows  through  the  existence  under  (3.12)  of  a  convex 
function  tp  with  ip( 0)  =  0,  ip(t)  >  0  for  t  ^  0,  such  that  £ (X)  >  ip(EX).  The  level  sets  j  t  |  ip(t)  <  c  }  are  then  bounded. 

Observe  first  that  if  a  sequence  of  finite  error  values  £(X  —  Ck)  approaches  the  minimum  with  respect  to  C,  it  is  a 
bounded  sequence  and  therefore,  since  £(X  —  Ck)  >  ip(EX  —  Ck),  the  sequence  of  expected  values  E[X  —  Ck]  is  bounded. 
Then  the  sequence  {  Ck  is  bounded,  so  a  subsequence  will  converge  to  some  C.  That  C  gives  the  minimum,  due  to 
the  closedness  of  £. 

Next  fix  a  value  c  £  1R  and  suppose  that  Xk  — >  X  with  T>{Xk)  <  c  for  fc  =  1,2,....  The  issue  is  whether  T>{ A')  <  c. 
For  each  k  there  is  a  Ck  with  V(Xk)  =  £{Xk  —  Ck ),  and  those  error  values  are  bounded  then  by  c.  In  consequence,  the 
sequence  of  values  E[Xk  —  Ck]  is  bounded.  Since  Xk  — >  X,  hence  EXk  — >  EX,  it  follows  that  a  subsequence  of  {  Ck 
has  to  converge  to  some  C,  in  which  case  the  corresponding  subsequence  of  {  Xk  —  Ck  converges  to  X  —  C.  The 
closedness  of  £  ensures  that  £(X  —  C)  <  c  and  hence  T>(X)  <  c,  as  required. 
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significance.  That,  especially,  is  where  the  theorem  innovates.  What  was  absent  in  the  past  was 
the  broad  concept  of  a  measure  of  regret,  not  limited  to  an  expectation,  and  the  realization  it  could 
anchor  a  fourth  corner  in  the  relationships,  thereby  serving  as  a  conduit  for  bringing  in  “utility” 
beyond  expected  utility. 

Risk  measure  formulas  of  type  (3.18)  with  accompaniment  in  (3.20)  have  gradually  emerged  without 
any  thought  that  they  might  be  connected  somehow  with  generalized  regression.  The  first  such  formula 
was  presented  in  Rockafellar  and  Uryasev  [2000]  and  its  follow-up  Rockafellar  and  Uryasev  [2002], 59 

CVaRam  =  mini  C  +  ~^—E[X  -  C\,  ), 

C  t  1  —  a  J  /q  on 

VaRa(V)  =  argmin  {  C  H - E[X  — 

C  L  1  -  a 

We  later  learned  that  the  “argmin”  part  of  this  was  already  known  in  the  statistics  of  quantile  regres¬ 
sion,  cf.  Koenker  and  Bassett  [1978],  Koenker  [2005],  but  with  the  minimization  expression  differing 
from  ours  by  a  positive  factor;  the  associated  “min”  quantity  got  no  attention  in  that  subject.  In  those 
days  we  were  mainly  occupied  with  the  numerical  usefulness  of  (3.21)  in  solving  problems  of  stochastic 
optimization  involving  VaR  and  CVaR  and  were  looking  no  further  in  the  direction  of  statistics. 

Earlier,  on  a  different  frontier,  the  concept  of  “optimized  certainty  equivalent”  was  defined  in  Ben- 
Tal  and  Teboulle  [1991]  by  a  trade-off  formula  very  much  like  the  one  for  getting  S  from  V  but  focused 
on  expected  utility  (“normalized”)  and  maximization,  instead  of  general  regret  and  minimization.  It 
was  applied  to  problems  of  optimization  in  Ben-Tal  and  Ben-Israel  [1991]  and  subsequently  Ben-Tal 
and  Ben-Israel  [1997].  Much  later  in  Ben-Tal  and  Teboulle  [2007],  once  the  theory  of  risk  measures 
had  come  into  development,  the  “min”  quantity  in  the  trade-off  received  attention  alongside  of  the 
“argmin,”  and  (3.21)  could  be  cast  as  a  special  case  of  their  previous  work  with  expected  utility.  An 
important  feature  of  that  work,  brought  out  further  in  Ben-Tal  and  Teboulle  [2007],  was  duality  with 
notions  of  information  and  entropy.60 

In  Krokhmal  [2007]  a  much  wider  class  of  trade-off  formulas  for  risk  measures  was  studied  with  the 
aim  of  generalizing  (3.21)  through  V-type  expressions  not  restricted  to  the  expectation  case.  In  that 
research,  as  in  Ben-Tal  and  Teboulle  [2007],  no  connections  with  statistical  theory  were  contemplated. 
In  other  words,  the  bottom  of  the  quadrangle  was  still  out  of  sight. 

It  is  convenient  to  speak  of  the  quantifiers  at  the  corners  of  the  fundamental  quadrangle,  under 
the  relations  in  Diagram  3,  as  constituting  a  quadrangle  quartet  (1Z,V,V,£)  with  statistic  S.  In  the 
regular  case  portrayed  in  the  Quadrangle  Theorem,  it  is  a  regular  quadrangle  quartet.  The  most 
attractive  case  adds  monotonicity  to  7 Z  and  V  along  with  the  corresponding  properties  of  T>  and  £  in 
(3.16)  and  (3.17);  we  will  then  call  the  quartet  monotonic.  On  the  other  hand,  in  the  case  where  the 
four  quantifiers  are  positively  homogeneous  we  will  speak  of  a  quartet  with  positive  homogeneity. 

Although  good  examples  of  regular  quadrangle  quartets  with  and  without  monotonicity  have  been 
provided  in  Section  2,  the  question  arises  of  how  additional  examples  might  be  constructed.  We  round 
out  this  section  with  three  results  which  can  assist  in  that  direction. 

The  first  one  is  elementary  but  puts  into  the  proper  perspective  of  an  entire  quadrangle  the 
operation  of  blending  risk  with  expectation  that  is  seen  in  the  formula  it  gives  for  1Z(X).  Such 
blending,  for  instance  with  TZq(X)  =  CVaRo^V),  has  gained  some  attention  in  finance. 

59In  some  papers  in  this  area  the  random  variables  X  were  taken  as  representing  uncertain  “gains”  instead  of  “losses.” 
The  resulting  formulas  are  of  course  equivalent  in  that  case,  but  minus  signs  have  to  be  juggled  in  the  translation. 

(,()Here,  see  the  end  of  Section  6. 
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Scaling  Theorem.  Let  {JZq,Vq,  Vo,  So)  be  a  regular  quadrangle  quartet  with  statistic  So  and  consider 
any  X  €  (0,oo).  Then  a  regular  quadrangle  quartet  (7 Z,V,V,£)  with  statistic  S  is  given  by 


S(X)  =  Sq(X), 

K(X)  =  (1  -  \)EX  +  X1Z0(X),  V(X)  =  \V0(X),  (3.22) 

V(X)  =  (1  -  A )EX  +  AVo(X),  S(X)  =  A £0(X), 

or  alternatively  by 

S(X)  =  XS0(X~1X), 

K(X)  =  XKoiX-'X),  V(X)  =  XVoiX^X),  (3.23) 

V{X)  =  AVo(A-1X),  £(X)  =  XSoiX^X). 

Monotonicity  and  positive  homogeneity  are  preserved  in  these  constructions,  except  that  monotonicity 
requires  A  >  1  in  (3.22). 

Scaling  as  in  (3.22)  is  present  in  Examples  1,  1' ,  and  could  very  well  be  added  to  Examples  2  and  3. 
The  alternative  form  in  (3.23)  provides  an  enrichment  to  Examples  8  and  9. 

Mixing  Theorem.  For  k  =  1 ,r  let  (TZk,  T>k,  Vk,  £k)  be  a  regular  quadrangle  quartet  with  statistic 

Sk,  and  consider  any  weights  Xk  >  0  with  Ai  +  •  •  •  +  Xr  =  1.  A  regular  quadrangle  quartet  (1 Z,  V ,  V,  S) 

with  statistic  S  is  given  then  by 

S(X)  =  AicSi(X)  +  •  •  •  +  A  rSr(X), 

K(X)  =  XiR,i(X)  +  •  •  •  +  A  rnr(X), 

V(X)  =  X1Vl(X)  +  •  •  •  +  XrVr{X),  24 

V(*)  =  Blm\{  ELi  -  Bk)  |  El=1  XkBk  =  0  }, 

£(X)  =  n1:.'!",,  {  511=1  Xk£k^X  ~  I  El=1  ^Bk  =  0  }. 


Moreover  ( 1Z ,  V,  V,  £)  is  monotonic  if  every  (TZk,Vk,  Vk,Sk)  is  monotonic,  and  (7Z,  V,  V,  £)  is  positively 
homogeneous  if  every  (JZk,Vk,  Vk,£k)  Is  positively  homogeneous. 

This  generalizes  a  result  in  Rockafellar  et  al.  [2008]  which  dealt  only  with  positively  homogeneous 
quantifiers.61  The  quadrangle  in  Example  10  illustrates  it  for  a  particular  case. 

Reverting  Theorem.  For  i  =  1,2,  let  (TZi,  T>i,  Vi,  Si)  be  a  regular  quadrangle  quartet  with  statistic 
Si .  Then  a  regular  quadrangle  quartet  (1Z,V,V,£)  with  statistic  S  is  given  by 

S(X)  =  l[S1(X)-S2(-X)], 

n(x)  =  ex  +  \[n  i(x)  +  n2(-x)], 

V(X)  =  1(D1(X)  +  V2(-X)}  =  l[n1(x)  +  n2(-x)},  (3.25) 

V(X)  =  EX  +  mm{ \[Vi(B  +  X)  +  V2{B  -  X)}  -  B  }, 

S(X)  =  nun{  liS^B  +  X)  +  £2(B  -  X)\  }. 

Positive  homogeneity  is  preserved  in  this  construction,  but  not  monotonicity. 

Example  11  illustrates  a  case  where  (TZi,  V\,  V\,  £\)  and  (' 1Z2 ,  F,2^2,£2)  coincide.  The  proof  of  the 
Reverting  Theorem  takes  advantage  of  bounds  £i(X )  >  'ipi(EX)  produced  from  (3. 12). 62 

61The  proof  is  essentially  the  same  as  in  that  case,  the  main  task  being  to  demonstrate  that  1Z  and  V  are  closed  and 
the  minimum  over  ,  Br  is  attained.  The  argument  follows  the  pattern  we  have  indicated  above  for  the  projection 

part  of  the  Quadrangle  Theorem,  making  use  of  inequalities  £k{X)  >  tpk(EX)  coming  from  (3.12). 

(,"It  starts  with  a  direct  calculation  of  the  minimum  of  £ (X  —  C)  over  C  with  the  mins  expression  for  £  inserted.  A 
change  of  variables  C\  =  C  —  B,  C2  =  —  C  —  B ,  shows  that  this  yields  the  claimed  S,  and  V.  The  corresponding  1Z  and 
V  are  confirmed  then  from  the  quadrangle  formulas. 
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A  further  operation  that  can  be  performed  on  risk  measures  is  “inf-convolution,”  cf.  Barrieu  and 
El  Karoui  [2005].  This  could  likewise  be  articulated  in  a  theorem  along  these  lines. 

4  Further  Model-Promoting  Results  and  Interpretations 

The  general  facts  in  Section  3  will  be  supplemented  in  this  section  by  more  detail  in  the  expectation 
case.  Claims  made  about  the  examples  of  expectation  quadrangles  in  Section  2  will  in  that  way  be 
confirmed.  Insight  will  be  provided  also  into  the  pattern  of  regret  versus  utility,  even  outside  the 
expectation  case,  and  how  it  can  affect  the  T>-£  side  of  the  quadrangle. 

In  relying  on  (3.13)  for  a  one-to-one  correspondance  between  regular  measures  of  regret  V  and 
regular  measures  of  relative  utility  U ,  we  are  in  particular  replacing  the  convexity  of  V  with  the 
concavity  of  U  and  requiring,  for  a  random  variable  Y  oriented  toward  gain,  that 

U( 0)  =  0  but  U(Y)  <  EY  when  Y  #  0.  (4.1) 

This  is  where  the  term  “relative”  comes  in.  The  gain  in  Y  needs  to  be  viewed  as  gain  relative  to 
some  benchmark.  That  contrasts  with  the  way  utility  theory  is  ordinarily  articulated  in  terms  of  the 
“absolute”  utility  of  an  outcome.  But  practitioners  appreciate  nowadays  that  investors,  for  instance, 
are  highly  influenced  by  benchmarks  in  their  attitudes  toward  gain  or  loss. 

The  case  of  expected  utility,  focused  on  E[u(Y)\  for  a  one-dimensional  utility  function  u  giving 
u(y)  for  a  sure  gain  y,  serves  well  in  explaining  this.  A  large  body  of  traditional  theory  in  finance, 
laid  out  authoritatively  in  Follmer  and  Schied  [2004],  looks  toward  maximizing  such  an  expression 
under  various  side  conditions  in  putting  together  a  good  portfolio.  The  utility  function  u  captures  the 
preferences  of  an  investor,  and  the  expectation  deals  with  the  uncertainty  when  the  gain  y  turns  into 
a  random  variable  Y .  Standard  functions  u  have  logarithmic  forms  and  the  like,  and  there  is  often 
nothing  “relative”  about  them. 

In  order  to  have  a  functional  U(Y)  =  E[u(Y)\  satisfy  (4.1)  and  be  closed  concave,63  the  natural 
specialization  is  to  require  u  to  be  a  function  of  y  with 

u  closed  concave  and  it(0)  =  0  but  u(y)  <  y  when  y  /  0.  (4.2) 

Again,  the  sense  in  that  would  come  from  a  benchmark  interpretation,  namely  that  y  no  longer  stands 
for  an  amount  of  money  received  in  the  future  but  rather  an  increment  (positive  or  negative)  to  some 
reference  amount.  A  utility  function  satisfying  (4.2),  but  with  “<”  weakened  to  “<,”  is  a  normalized 
utility  in  the  terminology  of  Ben-Tal  and  Teboulle  [2007].  Normalization  to  create  these  properties  is 
always  possible  in  the  expectation  case  because,  in  theory,  as  far  as  generating  a  preference  ordering 
for  y  values  is  concerned,  a  utility  u  is  only  determined  up  to  translations  and  an  arbitrary  scaling 
factor.64  For  our  quadrangle  scheme,  however,  such  normalization  is  not  merely  a  convenience  but 
essential.  Expected  utility  depends  not  only  on  the  ordering  induced  by  u  on  (—00,00),  but  also  on 
the  “curvature”  aspects  of  u,  and  the  choice  of  a  benchmark  can  have  a  large  impact  on  that,  apart 
from  some  special  cases. 

A  utility  function  u  satisfying  (4.2)  is  paired  with  a  regret  function  v  satisfying 

v  closed  convex  and  v(0)  =  0  but  v(x)  >  x  when  x  /  0.  (4.3) 

63  Closed  concavity  requires  the  “upper”  level  sets  of  type  >  c  to  be  closed  for  all  c  G  K,  in  contrast  to  closed  convexity, 
which  requires  all  “lower”  level  sets  of  type  <  c  to  be  closed. 

(,4Outside  of  the  expectation  case,  it  is  still  possible  to  shift  to  W(0)  =  0  as  a  “normalization,”  but  rescaling  is  insufficient 
to  get  to  U(Y)  <  EY . 
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under  the  correspondence65 

v(x)  =  —u(—x),  u(y)  =  —v(—y).  (4.4) 

The  properties  in  (4.3)  are  needed  for  V(X)  =  E[v(X)\  to  be  a  regular  measure  of  regret.  They  are 
crucial  moreover  in  the  correspondence  between  V  and  £  at  the  bottom  of  the  quadrangle  in  making 
£(X)  =  E[e{X)}  be  a  regular  measure  of  error  paired  with  V(X)  =  E[v(X)\  under  the  relations 

e(x)  =  v(x)  —  x,  v(x)=x  +  e(x),  (4-5) 


which  entail  having 


e  closed  convex  and  e(0)  =  0  but  e(x)  >  0  when  x  /  0.  (4.6) 

The  condition  on  the  utility  function  u  in  (4.2)  implies  that  u'(0)  =  1  when  u  is  differentiable  at  0, 
but  it  is  important  to  realize  that  u  might  not  be  differentiable  at  0,  and  this  could  even  be  desirable. 
From  concavity,  u  is  sure  at  least  to  have  right  derivatives  u'+(y)  and  left  derivatives  u'_(y)  satisfying 
u'_(y )  >  u'+(y),  usually  with  equality,  but  still  maybe  with  tt'_(0)  >  v!+(0).  This  would  mean  that,  in 
terms  of  relative  utility,  the  pain  of  a  marginal  loss  relative  to  the  benchmark  is  greater  than  pleasure 
of  a  marginal  gain  relative  to  the  benchmark.  Just  such  a  disparity  in  reactions  to  gains  and  losses  is 
seen  in  practice  and  reflects,  at  least  in  part,  the  observations  in  Kahneman  and  Tversky  [1979]. 

In  translating  this  from  a  concave  utility  function  u  to  a  convex  regret  function  v  as  in  (4.3),  we 
have,  of  course,  right  derivatives  v'+(x)  and  v’_ (x)  satisfying  v'_(x)  <  v'+(x),  usually  with  equality,  but 
perhaps  with  v'_  (0)  <  v'+  (0).  However,  something  more  needs  to  be  understood  in  connection  with  the 
ability  of  v  to  take  on  oo  and  how  that  affects  the  way  derivatives  are  treated  in  the  formulas  of  the 
theorem  below. 

The  convexity  of  v  implies  that  the  effective  domain  domw  =  {x|u(.t)  <  oo  }  is  an  interval  in 
(— oo,  oo)  (not  necessarily  closed  or  bounded).  If  x  is  the  right  endpoint  of  domu,  the  definition  of  the 
right  derivative  naturally  gives  v'+(x)  =  oo;  but  just  in  case  of  doubt  in  some  formula,  this  is  also  the 
interpretation  to  give  of  v'+(x)  when  x  is  off  to  the  right  of  domu.66  Likewise,  if  x  is  the  left  endpoint 
of  dornu,  or  further  to  the  left,  then  v'_(x)  =  — oo. 

These  are  the  patterns  also  for  an  error  function  e  as  in  (4.6). 

For  the  fundamental  quadrangle  of  risk,  the  consequences  of  these  facts  in  the  expectation  case 
are  summarized  as  follows. 

Expectation  Theorem.  For  functions  v  and  e  on  (—00,00)  related  by  (4.5),  the  properties  in  (4.3) 
amount  to  those  in  (4.6)  and  ensure  that  the  functionals 


V(X)  =  E[v(X)},  £(X)  =  E[e(X)], 


(4.7) 


form  a  corresponding  pair  consisting  of  a  regular  measure  of  regret  and  a  regular  measure  of  error.®" 
For  X  €  V  =  dom  V  =  dom£  let  C+(X)  =  sup  {  C  \  X  -  C  G  V  }  and  C~(X)  =  inf  {  C  \  X  -  C  €  V  }. 
The  associated  statistic  S  in  the  quadrangle  generated  from  V  and  £  is  characterized  then  by 


S(X)  =  |  C  |  E[e'_(X  —  C)\  <  0  <  E[e'+(X-C)}  }  =  {  C  I  E[v'_(X  -  C)]  <  1  <  E[v'+(X-C)\  }  (4.8) 


65In  this  correspondence  the  graphs  of  v  and  u  reflect  to  each  other  through  the  origin  of  M2. 

66The  issue  is  that  a  random  A'  might  produce  such  an  outcome  with  probability  0,  and  yet  one  still  needs  to  know 
how  to  think  of  the  formula. 

(”  Also,  V  corresponds  then  to  a  regular  measure  of  relative  utility  U  given  by  LHY)  =  F[u(T)]  under  (4.4)  via  (4.2). 
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subject  to  the  modiEcation  that,  in  both  cases,  the  right  side  is  replaced  by  oo  if  C  <  C  (X)  and  the 
left  side  is  replaced  by  — oo  if  C  >  C+(X).  The  quadrangle  is  completed  then  by  setting 


V(X)  =  E[e(X  -  C)\  and  K(X)  =  C  +  E[v(X  -  C)\  for  any /all  C  G  S(X).  (4.9) 


Having  V  and  TZ  be  monotonic  corresponds  (in  tandem  with  convexity)  to  having  v(x)  <  0  when 
x  <  0,  or  equivalently  e(x)  <  |a:|  when  x  <  0.  Positive  homogeneity  holds  in  the  quadrangle  if  and 
only  if  v  and  e  have  graphs  composed  of  two  linear  pieces  kinked  at  0. 

Beyond  the  aspects  of  this  theorem  that  are  already  evident,  the  key  ingredient  is  establishing 
(4.8).  This  is  carried  out  by  calculating  that  the  right  and  left  derivatives  of  the  convex  function 
4>(C)  =  E[e(X  —  C)\  from  their  definitions  and  noting  that  C  belongs  to  ar grain  <f>  if  and  only  if 
4)'_{C)  <  0  <  4>'+(C).  In  situations  where  v  and  e  are  differentiable,  the  double  inequalities  in  (4.8) 
can  be  replaced  simply  by  the  equations  E[e'(X  —  C)\  =  0  and  E[v'(X  —  C)\  =  1. 

We  proceed  now  to  illustrate  the  Expectation  Theorem  by  applying  it  to  justify  the  details  of  the 
examples  in  Section  2  that  belong  to  the  expectation  case. 

Quantile-based  quadrangle,  Example  2  (including  Example  3): 

,  ,  OL  r  r  ,  .  .  1  f  ,  .  .  1  f 

e(x)  =  - - max{0,  x\  +  max{0,  —  x\,  v(x)  =  - - max{0,  x},  u(y)  =  - - rninjO,  y\. 

i  —  ol  1  —  a  1  —  a 


We  have  V  =  £2(11),  C+(X)  =  oo,  C  (X)  =  — oo,  and 


l 

1— a 


o 


if  x  >  0, 
if  x  <  0, 


i 

1— a 


o 


if  x  >  0, 
if  x  <  0, 


with  a  gap  between  left  and  right  derivatives  occuring  only  at  x  =  0.  Then  with  FX(C)  denoting  the 
left  limit  of  F\  at  C  (the  right  limit  F/(C)  being  just  Fx{C)),  we  get 


E[v'_(X  -  C)]  =  x^probjX  >C}  =  1-  FX(C), 
E[v'+(X  -  C)}  =  x^probjX  >C}  =  1  -  FX(C). 


It  follows  thereby  from  (4.8)  that  S(X)  =  { C\FX(C )  <  a  <  Fx(C)j  and  therefore  <S(X)  =  qa(X). 
Applying  (4.10)  yields  TZ(X)  =  C  +  j^£max{0,  X  —  C}  =  C  +  f(c oc)(x  ~  C)dFx(x).  Since 
the  probability  of  (C,  oo)  is  1  —  FX(C),  this  equals  jz^\(Fx{C)  —  a)C  +  JjCoc)  xdFx(x),  which  is  the 
expectation  of  X  with  respect  to  its  “a-tail  distribution”  as  defined  in  Rockafellar  and  Uryasev  [2002] 
and  used  there  to  properly  define  qa(X )  even  under  the  possibility  that  FX(C)  >  a. 

Worst-case-based  quadrangle,  Example  5: 


e(x) 


|ar|  if  x  <  0 
oo  if  x  >  0, 


v(x) 


0  if  x  <  0  _  /  — oo  if  y  <  0 

oo  if  x  >  0,  U  ^  \  0  if  y  >  0. 


We  have  V  =  £2(fl),  C+(X)  =  oo,  C~(X)  =  sup X.  In  the  v  part  of  (4.8)  the  left  side  equals  0 
always  and  the  right  side  equals  0  if  C  <  supX  but  (through  the  prescribed  modification)  equals  oo 
if  C  =  supX.  Therefore,  C  =  supX  is  the  unique  element  of  <S(X)  (when  that  is  finite). 


Truncated- mean-based  quadrangle,  Example  7: 


e(x) 


|x|  —  |  if  |rc|  >  (3 , 
/bX2  if  |a;j  <  (3, 


v(x) 


—  |  if  x  <  — / 3 , 

x  +  jpx2  if  |x|  <  (3,  u(y) 
2x  —  |  if  x  >  f3, 


2y  +  f 

y~by2 


2 
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if  y<-P, 
if  \y\  <  (3, 
if  y  >  [3. 


This  time,  V  =  C2(X),  so  C+(X)  =  oo  and  C  (X)  =  —  oo.  The  statistic  is  determined  by  solving 
E[e'(X  —  C)\  =  0  for  C.  and  this  gives  the  result  described  because 


(  f3  if  x  >  f3, 

fie' (x)  =  Tfj(x)  =  <  x  if  —  fi  <  x  <  fi, 

1-/3  if  x<-fi. 


Log-exponential-based  quadrangle,  Example  8: 

e(x)  =  expx  —  x  —  1,  v(x)  =  expx  —  1,  u(y)  =  1  —  exp(— y). 

Here  V  =  {X  |  E[expX]  <  oo}.  Because  E[exp(X  —  C)\  =  exp(— C)E[exp  X],  we  have  C+{X)  =  oo 
and  C~(X )  =  — oo  for  any  X  £  V  ,  so  the  need  for  a  modification  of  the  bounds  in  (4.8)  is  avoided. 
Indeed,  since  v'(x)  =  expx,  we  just  have  an  equation  to  solve  for  C,  namely  E[ex p(X  —  C)\  =  1.  This 
equation  can  be  rewritten  as  E[expX]  =  exp C,  which  yields  C  =  logEfexpX]  as  S(X).  Substituting 
that  into  C  +  V(X  —  C),  we  get  TZ(X)  =  logE[expX]  and  the  quadrangle  is  confirmed. 

Rate-Based  Quadrangle,  Example  9: 

e(x)  =  f  log  X  ~  *  v(x)  =  I  hi  «*<!,  ufe)  =  /l°g(l  +  !/)  ««>-]• 

1  oo  if  x  >  1,  1  oo  if  x  >  1,  1  —  oo  if  y  <  -!• 

Here  V  =  {X  <  l|E[logj-^]  <  oo},  so  C+(X)  =  1  —  sup X  and  C~(X)  =  — oo.  Because  v  is 
differentiable  (where  finite),  we  have  an  equation  to  solve  in  (4.8):  E[1_^_g^]  =  1.  The  solution  is 
the  statistic  <S(X). 

Quadrangles  from  kinked  utility  and  regret.  More  examples  beyond  the  differentiable  case  of 
the  Expectation  Theorem  can  be  produced  by  starting  from  an  “absolute”  utility  function  uo(yo) 
that  is  differentiable,  increasing  and  strictly  concave,  introducing  a  benchmark  value  B,  and  a  “kink” 
parameter  5  >  0,  and  defining 

u0(y  +  B)  -u0{B) 

u(y)  = - rrx\ - +  5  mml°>  y}-  (4-10) 

u0{B) 

This  will  satisfy  it(0)  =  0  and  u(y)  <  y  when  y  ^  0,  and  it  will  be  differentiable  when  y  ^  0,  but  have 

u+{ 0)  =  1  but  u'_{ 0)  =  1  +  5.  (4.11) 

The  kink  parameter  5  models  the  extra  pain  experienced  in  falling  short  of  the  benchmark,  in  contrast 
to  the  milder  pleasure  experienced  in  exceeding  it.  From  this  u  it  is  straightforward  to  pass  to  the 
corresponding  v,  e,  and  the  full  quadrangle  associated  with  them  by  the  theorem.  In  general,  that 
quadrangle  will  depend  on  both  B  and  5,  but  in  special  situations  like  CARA  or  HARA  utilities68  the 
B  dependence  can  drop  out  or  reduce  to  simple  rescaling. 

The  surprising  fact  is  that  all  such  manipulations  are  propagated  by  the  quadrangle  scheme  into 
applications  not  just  to  risk  management  and  optimization  but  also  to  statistical  estimation.  Those 
applications  will  be  discussed  further  in  Section  5. 

68See  [Follmer  and  Schied,  2004,  pages  68-69]. 
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General  interpretations  of  the  quadrangle  “statistic.”  Returning  finally  to  the  general 
level  of  the  correspondence  U  «->•  V  between  relative  utility  and  regret  in  (3.13)  we  look  at  ways  of 
interpreting  the  trade-off  formula  77(A)  =  mine  {  C  +  V(A  —  C)  }.  Through  a  change  of  variables 
Y  =  —X,  W  =  —C,  switching  loss  to  gain,  this  corresponds  to 

—TZ(—Y)  =  max  {W  +  U(Y  —  W)  }.  (4.12) 

w 

Considerations  were  focused  in  Ben-Tal  and  Teboulle  [2007]  on  the  expectation  case,  but  an  inter¬ 
pretation  suggested  there  works  well  for  (4.12)  in  general.  To  begin  with,  note  that  in  adding  W  to 
IA{Y  —  W)  it  is  essential  that  W  be  measured  in  the  same  units  as  U(Y  —  W),  and  moreover  they  have 
to  be  the  same  units  as  those  of  Y.  A  simple  case  where  this  makes  perfect  sense  is  the  one  in  which 
the  units  are  money  units,  like  dollars.  Then  W  represents  an  income  that  is  certain,  whereas  Y  —  W 
is  residual  income  that  is  uncertain;  U  assigns  to  that  uncertain  income  an  equally  desirable  amount 
of  certain  income  in  something  akin  to  a  discount.  This  leads,  in  Ben-Tal  and  Teboulle  [2007],  a  W 
giving  the  max  in  (4.12)  being  called  an  optimized  certainty  equivalent  for  Y . 

Much  the  same  can  be  said  about  the  regret  version  of  trade-off,  77(A)  =  mine  {  C  +  V(A  —  C )  }. 
There,  C  is  a  loss  that  is  certain,  X  —  C  is  a  residual  loss  that  is  uncertain.  The  regret  measure  V 
assigns  to  X  —  C  an  amount  of  money  that  could  be  deemed  adequate  as  immediate  compensation 
for  taking  on  the  burden  of  X  —  C.  It  is  possible  to  elaborate  this  with  ideas  of  insurance,  insurance 
premium,  “deductibles,”  and  so  forth.  For  some  insurance  interpretations  in  the  utility  context  of 
(4.12),  see  Ben-Tal  and  Teboulle  [2007]. 

Although  these  “min”  formulas  and  interpretations  are  natural  in  their  own  right,  the  special 
insight  from  the  risk  quadrangle,  namely  that  they  have  a  parallel  life  in  theoretical  statistics,  is  new. 

5  Quadrangle  Roles  in  Optimization  and  Regression 

Applications  involving  the  quantifiers  on  both  sides  of  the  risk  quadrangle  have  provided  key  motivation 
and  guidance  for  the  theory  that  has  been  laid  out.  The  purpose  of  this  section  is  to  explain  that 
background  and  indicate  advances  that  the  theory  now  brings. 

Optimization.  Risk  in  the  sense  quantified  by  a  risk  measure  77  is  central  in  the  management 
and  control  of  cost  or  loss.  For  a  hazard  variable  X,  the  crucial  issue  there  is  how  to  model  a  “soft” 
upper  bound,  i.e.,  a  condition  that  the  outcomes  of  X  be  “adequately”  <  C  for  some  C.  As  already 
explained  in  Section  1,  the  broad  prescription  for  handling  this  is  to  pass  to  a  numerical  inequality 
77(A)  <  C  through  some  choice  of  a  risk  measure  7 7,  and  many  possibilities  for  77  have  been  offered. 
Of  course  C  can  be  taken  to  be  0  without  any  real  loss  of  generality. 

A  choice  of  77  corresponds  to  an  expression  of  preferences  toward  risk,  but  it  might  not  yet  be 
clear  why  some  measures  of  risk  are  better  motivated  or  computationally  more  tractable  than  others. 
The  key  challenge  is  that  most  applications  require  more  than  just  looking  at  77(A)  for  a  single  A, 
as  far  as  optimization  is  concerned.  Usually  instead,  there  is  a  random  variable  that  depends  on 
parameters  x\, . . . ,  xn.  We  have  A(aq, . . .  ,xn)  and  it  becomes  important  to  know  how  the  numerical 
surrogate  77(A(aq, . . . ,  xn))  depends  on  x\, . . . ,  xn.  This  is  where  favorable  conditions  imposed  on  77, 
like  convexity  and  monotonicity,  are  indispensable. 

Motivations  in  optimization  modeling  are  important  in  particular.  For  insight,  consider  first  a 
standard  type  of  deterministic  optimization  problem,  without  uncertainty ,  in  which  x  =  (xi, . . .  ,xn) 
is  the  decision  vector,  namely 

(' P )  minimize  fo(x)  over  all  x  G  S'  C  Mn  subject  to  /*(x)  <0  for  i  =  1, . . .  ,m. 
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A  decision  x  selected  from  the  set  S  results  in  numerical  values  /o(x),  /i(x), . . . , /m(x),  which  can 
be  subjected  to  the  usual  techniques  of  optimization  methodology.  Suppose  next,  though,  that  these 
cost-like  expressions  are  uncertain  through  dependence  on  additional  variables  —  random  variables  — 
whose  realizations  will  not  be  known  until  later.  A  decision  x  merely  results  then  in  random  variables 69 

X0(x)  =  /0(x),  Xi(x)  =  h(x), ...,  Xm{x)  =  /m(x),  (5.1) 

which  can  only  be  shaped  in  their  distributions  through  the  choice  of  x,  not  pinned  down  to  specific 
values.  Now  there  is  no  longer  a  single,  evident  answer  to  how  optimization  should  be  viewed,  but 
risk  measures  can  come  to  the  rescue. 

As  proposed  in  Rockafellar  [2007],  one  can  systematically  pass  to  a  stochastic  optimization  problem 
in  the  format70 

(V  )  minimize  /0(x )  =  TZ o(/0(x))  over  x  £  S  subject  to  f  i(x)  =  7£j(/j(x))  <0,  i  =  1, . . . ,  m, 

in  which  an  individually  selected  “measure  of  risk”  TZl  has  been  combined  with  each  £i(x)  to  arrive 
at  a  numerical  (nonrandom)  function  fi  of  the  decision  vector  x.71 

An  issue  that  must  then  be  addressed  is  how  the  properties  of  /j(x)  with  respect  to  x  relate  to  those 
of  f_i(x)  through  the  choice  of  lZt,  and  whether  those  properties  are  conducive  to  good  optimization 
modeling  and  solvability.  This  is  not  to  be  taken  for  granted,  because  seemingly  attractive  examples 
like  7 Zi(X)  =  EX  +  A jcr(X)  with  A i  >  0  or  7 Zi(X)  =  qa%(X)  =  VaRQj (X)  with  0  <  a,  <  1  are  known 
to  suffer  from  troubles  with  “coherency”  in  the  sense  of  Artzner  et  al.  [1999]. 

Convexity  Theorem'2.  In  problem  (V),  the  convexity  of  /,;(x)  with  respect  to  x  is  assured  if  /,;(x) 
is  linear  in  x  and  1Z%  is  a  regular  measure  of  risk,  or  if  £j(x)  is  convex  in  x  and  7Zl  is,  in  addition,  a 
monotonic  measure  of  risk. 

The  huge  advantage  of  having  the  functions  fi  be  convex  is  that  then,  with  the  set  S  also  convex, 
(V  )  is  an  optimization  problem  of  convex  type.  Such  problems  are  vastly  easier  to  solve  in  computation. 

The  use  of  7 Zi(X)  =  qai(X)  =  VaRai(A)  in  this  setting  could  destroy  whatever  underlying  con¬ 
vexity  with  respect  to  x  =  (xi, . . .  ,xn)  might  be  available  in  the  problem  data,  because  this  measure 
of  risk  lacks  convexity;  it  is  not  regular  and  not  coherent.  The  shortcoming  of  7 Z-i(X)  =  EX  +  Xia(X) 
is  different:  it  fails  in  general  to  be  monotonic.  The  absence  of  monotonicity  threatens  the  transmittal 
of  convexity  of  /j(x)  to  /j(x).  However,  /j(x)  can  still  be  convex  in  x,  on  the  basis  of  the  Convexity 
Theorem,  as  long  as  f_i(x)  is  linear  in  x.  This  could  be  useful  in  applications  to  financial  optimization, 
because  linearity  with  respect  to  x,  as  a  vector  of  “portfolio  weights,”  is  often  encountered  there. 

b9We  employ  underbars  in  this  discussion  to  indicate  uncertainty.  The  overbars  appearing  later  emphasize  that  the 
random  variable  depending  on  x  has  been  converted  to  a  nonrandom  numerical  function  of  x. 

' °If  taken  too  literally,  this  prescription  could  be  simplistic.  When  uncertainty  is  present,  much  closer  attention  must 
be  paid  to  whether  the  objective  and  constraint  structure  in  the  deterministic  formulation  itself  was  well  chosen.  The 
effects  of  possible  recourse  actions  when  constraints  are  violated  may  need  to  be  brought  in.  Whether  risk  measures 
should  be  applied  to  the  ffs  individually  or  to  a  combination  passed  through  some  joint  expression  must  be  considered 
as  well. 

71The  constraint  modeling  in  (V_)  follows  the  prescription  that  TZi {£i(x))  <  0  provides  a  rigorous  interpretation  to 
the  desire  of  having  £t(x)  “adequately”  <  0,  but  the  motivation  for  the  treatment  of  the  objective  in  (V_)  may  be  less 
clear.  Actually,  it  follows  the  same  prescription.  Choosing  x  to  minimize  1Z o{£o(x))  can  be  identified  with  choosing  a 
pair  (a:,  Co)  subject  to  7Zo(£0(x))  <  Co  so  as  to  get  Co  as  low  as  possible,  and  the  inequality  TZo(£0(x))  <  Co  models 
having  £0(x)  “adequately”  <  Co-  This  is  valuable  in  handling  the  dangers  of  “cost  overruns.” 

,2This  extends,  in  an  elementary  way,  a  principle  in  Rockafellar  [2007]. 

73 Convexity  of  the  random  variable  /  A*)  with  respect  to  x  refers  to  having /i((l  — A)*o+Aa;i)  <  (1  —  X)£i(xo)+X£i{xi) 
as  a  relation  among  random  variables,  i.e. ,  with  “almost  surely”  coming  in. 
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Another  example  of  a  measure  of  risk  that  is  regular  without  being  monotonic  is  the  reverted 
CVaR  in  Example  11:  7Zi(X)  =  EX  +  ^[CVaRQ;(V)  +  CVaRai(— X)].  Once  more,  although  this 
choice  would  not  preserve  convexity  in  general,  it  would  do  so  when  f_i(x)  is  linear  in  x. 

A  question  of  modeling  motivation  must  be  confronted  here.  Why  would  one  ever  wish  to  use  in  a 
stochastic  optimization  problem  {V_ )  a  regular  risk  measure  that  is  not  monotonic,  even  in  applications 
with  linearity  in  x,  when  so  many  choices  do  have  that  property?  An  interesting  justification  can 
actually  be  given,  which  could  sometimes  make  sense  in  finance,  at  least.  The  rationale  has  to  do 
with  skepticism  about  the  data  in  the  model  and  especially  a  wish  to  not  rely  too  much  on  data  in 
the  extreme  lower  tail  of  a  cost  distribution.  Optimization  with  today’s  data  will  be  succeeded  by 
optimization  with  tomorrow’s  data,  all  data  being  imperfect.  It  would  be  wrong  to  swing  very  far  in 
response  to  ephemeral  changes,  at  least  in  formulating  the  objective  function  /0(x)  =  7^o(/o(x))- 

The  following  idea  comes  up:  replace  this  objective,  in  the  case  of  a  regular  monotonic  measure  of 
risk  TZq,  by  a  measure  of  risk  having  the  form 

TZ0(X)  =  IZo(X)  +  V(X)  for  some  regular  measure  of  deviation  V.  (5.2) 

This  would  be  another  regular  measure  of  risk,  even  if  not  monotonic.  The  deviation  term  would  be 
designed  to  have  a  “stabilizing”  effect. 

If  a  choice  like  IZi(X)  =  qai(X)  =  VaRai(A)  ought  to  be  shunned  when  convexity  in  (V)  is  to  be 
promoted,  what  might  be  the  alternative?  This  is  a  serious  issue  because  risk  constraints  involving 
this  choice  are  very  common,  especially  in  reliability  engineering,74  because 

qai(li(x))<  0  <*=*>  prob{ii(®)  <  0}  >  on.  (5.3) 

A  strong  argument  can  be  made  for  passing  from  quantiles/ VaR  to  superquantiles/C  VaR  by  instead 
taking  'R.l(X)  =  qa.(X)  =  CVaRQi (V).  This  has  the  effect  of  replacing  “probability  of  failure” 
by  an  alternative  called  “buffered  probability  of  failure,”  which  is  safer  and  easier  to  work  with 
computationally;  see  Rockafellar  and  Royset  [2010]. 

The  claim  that  problem-solving  may  be  easier  with  CVaR  than  with  VaR  could  seem  surprising 
from  the  angle  that  CVaRo,(V)  is  defined  as  a  conditional  expectation  in  a  “tail”  which  is  dependent 
on  VaRQ(V),  yet  it  rests  on  the  characterization  in  (3.21).  But  we  have  explained  in  Rockafellar  and 
Uryasev  [2002] ' 5  how,  in  the  case  of  (V)  with  lZt  =  CVaRQj  for  each  i,  one  can  expand  CVaRai(/ i(x)) 
through  (3.21)  into  an  expression  involving  a  auxiliary  parameter  C\  and  go  on  to  minimize  not  only 
with  respect  to  x  but  also  simultaneously  with  respect  to  the  Ci  s.  This  has  the  benefit  not  only  of 
simplifying  the  overall  minimization  but  also  providing,  along  with  the  optimal  solution  x  to  (V ), 
corresponding  VaR ai(Zi(x))  values  as  the  optimal  Ci  s. 

Now  we  are  in  position  to  point  out,  on  the  basis  of  the  risk  quadrangle,  that  this  technique  has  a 
new  and  far-reaching  extension. 

Regret  Theorem.  Consider  a  stochastic  optimization  problem  ( V )  in  which  each  IZi  is  a  regu¬ 
lar  measure  of  risk  coming  from  a  regular  measure  of  regret  V,;  with  associated  statistic  Si  by  the 
quadrangle  formulas 

TZi(X)  =  min  {  C  +  V* {X  -  C)  },  Si(X)  =  argmin  {  C  +  Vt(X  -  C)  }.  (5.4) 

c  c 

74The  article  Samson  et  al.  [2009]  furnishes  illuminating  background. 

,5See  also  the  tutorial  paper  Rockafellar  [2007]. 
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Solving  (V )  can  be  cast  then  as  solving  the  expanded  problem 

choose  x  =  (.Ti, . . . ,  xn)  and  Co,  C\, . . . ,  Cm  to 
( V_ ')  minimize  Cq  +  Vo(Xo(x)  ~  Co)  over  x  £  S,  Cj  6  J?, 

subject  to  Ci  +  V,(/  j(x')  —  C\)  <  0  for  i  =  1, . . . ,  m. 

An  optimal  solution  (. x ,  Cq,  Ci,  . . . ,  Cm)  to  problem  (V')  provides  as  x  an  optimal  solution  to  problem 
(V  )  and  as  Ci  a  corresponding  value  of  the  statistic  Si(f_i(x))  for  i  =  0, 1, . . . ,  m. 

The  Mixing  Theorem  of  Section  3  can  be  combined  with  Regret  Theorem.  When  V,  is  itself 
expressed  by  a  minimization  formula  in  extra  parameters,  these  can  be  brought  into  ( V )  as  well. 

The  idea  behind  the  Regret  Theorem  is  not  restricted  to  regret  measures.  It  can  operate  just  as 
well  for  deviation  measures  in  terms  of  error  measures  through  the  quadrangle  principle  that 

V(X)  <  c  b=b  £(X  —  C)  <  c  for  a  choice  of  C  G  M. 

Estimation.  The  topic  of  generalized  regression  is  next  on  the  agenda.  As  explained  in  Section  1, 
this  concerns  the  approximation  of  a  given  random  variable  Y  by  a  function  f(X\, . . . ,  Xn )  of  other 
random  variables  A i , . . . ,  Xn .  By  the  regression  being  “generalized”  we  mean  that  the  difference 
Zf  =  Y  —  f(X i, . . . , Xn)  may  be  assessed  for  its  nonzeroness  by  an  error  measure  £  different  from  the 
one  in  “least-squares”  as  in  Example  1,  or  for  that  matter  even  from  the  kind  in  quantile  regression,  as 
in  Example  2.  The  case  of  generalized  linear  regression,  where  the  functions  /  in  the  approximation 
are  limited  to  the  form 


f(x i, . . . ,  xn)  =  Co  +  C\X\  +  •  •  •  +  Cnxn  (the  linear  case),  (5.5) 

has  already  been  studied  in  Rockafellar  et  al.  [2008] ,  but  only  for  error  measures  £  that  are  positively 
homogeneous.  Here  we  go  beyond  those  limitations  and  investigate  the  problem: 

minimize  £(Zf )  over  all  /  E  C,  where  Zf  =  Y  —  f(X i, . . .  ,Xn),  (5-6) 

for  given  random  variables  X\, . . . ,  Xn ,  Y ,  and  some  given  class  C  of  functions  /. 

Taking  C  to  be  the  class  in  (5.5)  with  respect  to  all  possible  coefficients  Co,  Ci, . . . ,  Cn,  would 
specialize  to  linear  regression,  pure  and  simple.  Then  £{Zf)  would  be  a  function  of  these  coefficients 
and  we  would  be  minimizing  over  (Co,Ci, . . .  ,Cn)  G  Mn+1.  However,  even  in  the  linear  case  there 
could  be  further  specialization  through  placing  conditions  on  some  of  the  coefficients,  such  as  perhaps 
nonnegativity.  In  fact,  a  broad  example  of  the  kinds  of  classes  of  regression  functions  that  can  be 
brought  into  the  picture  is  the  following:76 

C  =  all  the  functions  /( aq, . . . ,  xn)  =  C0  +  Ci/ii(aq, . . . ,  xn)  -| - b  Cmhm(x i, . . . ,  xn)  . 

for  given  hi, ... ,  hm  on  Hiri  and  coefficient  vectors  (C\, . . . ,  Cm )  in  a  given  set  C  C  Mrn . 

Motivation  for  generalized  regression  comes  from  applications  in  which  Y  has  the  cost/loss  orien¬ 
tation  that  we  have  been  emphasizing  in  this  project.  Underestimation  might  then  be  more  dan¬ 
gerous  than  overestimation,  and  that  could  suggest  using  an  asymmetric  error  measure  £,  with 
£(Zf)  /  £(-Zf). 

Further  motivation  comes  from  “factor  models”  and  other  such  regression  techniques  in  finance 
and  engineering,  which  might  have  unexpected  consequences  when  utilized  in  stochastic  optimization 

,6It  should  also  be  kept  in  mind  that  a  possibly  nonlinear  change  of  scale  in  the  variables,  such  as  passing  to  logarithms, 
could  be  executed  prior  to  this  depiction. 
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because  of  interactions  with  parameterization  by  the  decision  vector  x.  For  instance,  if  one  of  the 
random  “costs”  J_i(x)  in  problem  (V )  is  estimated  by  such  a  technique  as  g_i(x),  it  may  be  hard  to 
determine  the  effects  this  could  have  on  the  optimal  decision.  We  have  argued  in  Rockafellar  et  al. 
[2008],  and  demonstrated  with  specific  results,  that  it  might  be  wise  to  “tune”  the  regression  to  the  risk 
measure  77*  applied  to  f_i(x )  in  (V).  This  would  mean  passing  around  the  fundamental  quadrangle 
from  IZi  to  an  error  measure  Si  in  the  same  quartet. 

Regression  Theorem.  Consider  problem  (5.6)  for  random  variables  X\ , . . . ,  Xn  and  Y  in  the  case 
of  £  being  a  regular  measure  of  error  and  C  being  a  class  of  functions  f  :  Mn  — >•  M  such  that 

feC  f  +  CeCfor  all  C  E  R.  (5.8) 

Let  V  and  S  correspond  to  £  as  in  the  Quadrangle  Theorem.  Problem  (5.6)  is  equivalent  then  to: 

minimize  T>(Zf)  over  all  f  E  C  such  that  0  £  S(Zf),  (5-9) 

which  in  the  case  of  a  class  C  as  in  (5.7)  and  Hi  =  hi(X i, . . . ,  Xn )  comes  down  to 

minimize  V(Y  —  [C\H\  +  •  •  •  +  CmHm ]),  then  take  Co  £  S(Y  —  [C\H\  +  •  •  •  +  CmHm ]).  (5.10) 

Moreover  if  £  is  of  expectation  type  and  C  includes  a  function  f  satisfying 

f{x  1, . . . ,  xn)  e  S(Y(x  1, . . . ,  Xn))  almost  surely  for  (xi, . . . ,  xn)  €  D,  ,  . 

where  Y(x\, . . . ,  xn)  =  Yx  x  ...  x-=x„  (conditional  distribution),  '  ' 

with  D  being  the  support  of  the  distribution  in  lRn  induced  by  X  i,. ..  ,Xn,  then  that  f  solves  the 
regression  problem  and  tracks  this  conditional  statistic 78  in  the  sense  that 

f(X i, . . . ,  Xn)  £  S(Y (Xi, . . . ,  Xn))  almost  surely.  (5.12) 

The  first  part  of  this  result  generalizes  [Rockafellar  et  al.,  2008,  Theorem  3.2]  on  linear  regression 
through  elementary  extension  of  the  same  proof.  The  specialization  in  (5.10)  relies  on  V(Z  —  Co)  = 
T>(Z)  and  S(Z  —  Co)  =  S(Z)  —  Co-  The  second  part  is  new.  It  comes  from  the  observation  that,  in 
the  expectation  case,  if  /  satisfies  (5.11),  then  for  any  other  g  £  C  one  has 

£(Y(x  i,  ...,xn)~  f(x  i, . . .  ,x„))  <  £(Y(x  i,  ...,xn)~  g(x  i, . .  .,xn)) 
almost  surely  for  (x\, . . . ,  xn)  £  D. 

When  £  is  of  expectation  type,  this  inequality  can  be  “integrated”  over  the  distribution  of  (X\, . . . ,  Xn) 
to  obtain  £(Y(X  i,  ...,Xn)~  f(X  i,  ...,Xn))<  £(Y(XU  ...,Xn)~  g(X  i, . . . ,  Xn)). 

Apart  from  that  special  circumstance,  the  question  of  the  existence  of  an  optimal  regression  func¬ 
tion  /  £  C  has  not  been  addressed  in  the  theorem,  because  we  are  reluctant  in  the  present  context  to 
delve  deeply  into  the  possible  structure  of  the  class  C.  But  existence  in  the  case  of  linear  regression 
has  been  covered  in  [Rockafellar  et  al.,  2008,  Theorem  3.1],  and  similar  considerations  would  apply  to 
the  broader  class  in  (5.7),  with  the  coefficient  set  C  taken  to  be  closed.79 

"Almost  surely,  in  (5.11),  refers  to  this  distribution. 

78It  is  assumed,  for  this  part,  that  the  distribution  of  Y(x i, . . .  ,xn)  for  (an, . . .  ,xn)  €  D  belongs  to  £2(fi),  and  the 
same  then  for  the  random  variable  T(A'i, . . . ,  Xn)  obtained  from  it. 

79Work  with  the  class  in  (5.7),  which  does  of  course  satisfy  (5.9),  can  actually  be  reduced  to  the  linear  case,  so  that 
generalized  linear  theory  can  be  applied.  To  do  this  we  can  introduce  new  random  variables  Wt  =  ht(Xi , . . . ,  Xn)  with 
distributions  inherited  from  the  Xfs  and  carry  out  linear  regression  of  Y  with  respect  to  Wi, . . . ,  Wm- 
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There  could  be  many  applications  of  these  ideas,  and  much  remains  to  be  explored  and  developed. 
Some  related  research  in  special  cases,  largely  concerned  with  quantile  regression,  can  be  seen  in 
Trindade  and  Uryasev  [2006a],  Trindade  and  Uryasev  [2006b]  and  Golodnikov  et  al.  [2007];  see  also 
Samson  et  al.  [2009]  for  further  motivation. 

The  measure  of  error  in  quantile  regression  is  indeed  of  expectation  type,  so  that  the  second  part 
of  our  Regression  Theorem  can  be  applied  if  the  class  C  of  functions  /  is  rich  enough.  The  class  of 
linear  functions  of  X\ , . . .  ,Xn  would  very  likely  not  meet  that  standard,  but  the  class  in  (5.7)  may 
offer  hope  through  judicious  choice  of  hi, . . . ,  hrn . 

6  Probability  Modeling  and  the  Dualization  of  Risk 

More  explanation  about  the  view  of  uncertainty  that  we  take  here  may  be  helpful,  especially  for  the 
sake  of  those  who  would  like  to  make  use  of  the  ideas  without  having  to  go  too  far  into  the  technical 
mathematics  of  probability  theory.  In  modeling  uncertain  quantities  as  random  variables,  we  tacitly 
regard  them  as  having  probability  distributions,  but  this  does  not  mean  we  assume  those  distributions 
are  directly  known.  Sampling,  for  instance,  might  be  required  to  learn  more,  and  even  then,  only 
approximations  might  be  available. 

The  characteristics  of  a  random  variable  X ,  by  itself,  are  embodied  in  its  cumulative  distribution 
function  Fx,  with  Fx(x)  =  prob{  X  <  x}.  This  induces  a  probability  measure  on  the  real  numbers 
M  which  may  or  may  not  be  expressible  by  a  density  function  /  with  respect  to  ordinary  integration, 
i.e. ,  as  dFx(x)  =  f(x)dx.  The  lack  of  a  density  function  is  paramount  when  X  is  a  discrete  random 
variable  with  only  finitely  many  possible  outcomes.  Then  Fx  is  a  step  function. 

Sometimes  the  underlying  uncertainty  being  addressed  revolves  around  observations  of  several 
random  variables  V) , . . . ,  Vm,  and  their  joint  distribution.  The  corresponding  probability  measure  on 
Mm  is  induced  then  by  the  multivariate  distribution  function 

FVi ■  ■  ■  ’vm)  =  probj  (V1,....,Vm)  <  (ui,...,um)}.  (6.1) 

Functions  x  =  g(v give  rise  to  random  variables  X  =  g(V having  Fx(x)  = 
prob  {g(V i, . . . ,  Vm)  <  x}.  Again,  the  distribution  of  (V\ , . . . ,  Vm)  need  not  be  describable  by  a  density 
function  f{v±, . . . ,  vm).  We  might  be  dealing  with  a  discrete  distribution  of  (Vi, . . . ,  Vm)  corresponding 
to  an  m-dimensional  “scatter  plot.” 

The  standard  framework  of  a  probability  space  serves  for  handling  all  these  aspects  of  randomness 
easily  and  systematically.  It  consists  of  a  set  II  supplied  with  a  probability  measure  Po  and  a  field  A 
of  its  subsets.80  We  think  of  the  elements  oj  G  II  as  “future  states”  (of  information),  or  “scenarios.” 
Having  a  subset  A  of  H  belong  to  A  means  that  the  probability  of  uj  being  in  A  is  regarded  as  known 
in  the  present:  prob{A}  =  Pq(A).  In  that  way,  the  field  A  is  a  model  for  present  information  about 
the  future.  There  could  be  multistage  approaches  to  such  information,  in  which  A  is  just  the  first  in 
a  chain  of  ever-larger  collections  of  subsets  of  H,  but  we  are  not  looking  at  that.  A  scenario  t v  could, 
in  our  setting,  nonetheless  involve  multiple  time  periods,  but  we  are  not  going  to  consider,  here,  how 
additional  observations,  as  the  scenario  unfolds,  might  be  put  to  use  in  optimization. 

Random  variables  in  this  framework  are  functions  X  :  H  — >  1R,  with  future  outcomes  X(uj),  such 
that,  for  every  x  €  M,  the  set  A  =  {  oj  \  X(u)  <  x  }  belongs  to  A.81  The  expected  value  of  a  random 
variable  X  is  the  integral  EX  =  fnX(La)dPo(uj).  As  a  special  case,  the  probability  space  (Q,A,Pq) 

S0We  write  Po  for  this  underlying  probability  measure  in  order  to  reserve  P  for  general  purposes  below. 

81The  sets  A  €  A  are  called  the  “measurable”  sets  and  the  functions  X  in  question  the  “measurable”  functions. 


34 


could  be  generated  by  future  observations  of  some  variables  Vj , . . . ,  Vm,  as  above,  in  which  case  hi  would 
be  a  subset  of  Mm  with  elements  u  =  (tq, . . . ,  vrn)  and  Po  would  be  the  probability  measure  induced  by 
the  joint  distribution  function  FVl Vm.  If  Po  has  a  density  function  f(v i, . . . ,  vm)  with  respect  to  or¬ 
dinary  integration,  then  for  X  =  <7  (Vi, . . . ,  Vm)  one  has  EX  =  f  g(y i, . . . ,  vm)f(y i, . . . ,  vm)dv i . . .  dvm, 
but  without  such  a  density,  it  is  not  possible  to  rely  this  way  on  dv\  . . .  dvm.  That  is  why,  in  achieving 
adequate  generality,  it  is  crucial  to  refer  to  a  background  probability  measure  Po  as  the  source  of  all 
the  distributions  that  come  up. 

Despite  that  focus,  a  means  is  provided  for  considering  alternatives  P  to  Po,  and  indeed  this  will 
be  very  important  in  subsequent  discussions  of  risk  and  its  dualization.  Other  probability  measures 
P  can  enter  the  picture  as  long  as  the  expected  value  Ep(X)  =  f  X(u)dP(uj)  can  be  expressed  by 
E[XQ\  =  /  X(io)Q(u)dPo(uj)  for  some  random  variable  Q,  which  is  then  called  the  density  of  P  with 
respect  to  Po  with  notation  Q  =  dP/dPo-82  For  instance,  in  the  case  where  D  has  finitely  many 
elements  a k  =  1, . . . ,  IV,  if  Po  gives  them  equal  weight  1/N  but  P  assigns  probability  p\.  to  u then 
Q{ix>k)  =  PkN. 

Another  point  needing  emphasis  is  that  little  is  really  lost  in  supposing  the  existence  of  an  underly¬ 
ing  probability  measure  Po,  even  if  prospects  of  knowing  much  about  it  are  low.  Convenience  in  theory 
can  be  served  nonetheless.  In  “robust  optimization,”  for  example,  direct  probability  is  in  principle 
avoided,  and  yet  a  so-called  uncertainty  set  has  to  be  constructed.  That  set,  often  identified  through 
rough  considerations  of  probability  anyway,  can  be  identified  here  with  the  space  D.  The  worst-case 
risk  measure  P(A)  =  sup  A,  which  is  the  prime  focus  of  “robust  optimization,”  is  captured  anyway 
as  generated  by  considering  all  P  alternative  to  Po  in  the  above  sense,  as  will  be  explained  below. 
Similarly,  the  “distributed  worst-case”  risk  measure  of  Example  6  is  covered  without  having  to  know 
very  much  about  Po- 

The  need  to  deal  securely  with  expectations  of  random  variables  and  certain  products  of  random 
variables  forces  some  restrictions.  For  any  random  variable  X,  the  expressions  ||AT||P  introduced  earlier 
are  well  defined  but  could  be  oo.  It  is  common  practice  to  work  with  the  spaces83 


cp(n)  =  cp(n,A,p0)  =  {x 


X\\p  <  oo  },  where  £1(D)  D  •  •  •  D  £P(D)  D  •  •  •  D  £°°(fi). 


(6.2) 


For  any  X  in  these  spaces,  EX  is  well  defined  and  finite,  but  the  situation  for  products  of  random 
variables,  like  XQ  above,  is  more  delicate.  While  there  are  options  with  X  in  one  space  and  Q  in 
another,  no  choice  is  perfect. 

For  our  purposes  here,  £2(D)  has  been  taken  as  the  platform.  That  has  the  simplifying  advantage 
that  E\XQ\  <  oo  for  any  X  E  £2(D)  and  Q  E  £2(D).  However,  it  does  mean  that,  in  considering 
alternative  probability  measures  P  with  densities  Q  =  dP/dPo  the  restriction  must  be  made  to  the 
cases  where  fn(dP / dPo)2 (co)dPo  <  oo.  Actually,  though,  this  restriction  makes  little  difference  in  the 
end,  because  other  probability  measures  can  adequately  be  mimicked  by  these  (and  for  finite  D  is  no 
restriction  at  all). 

Dualization  concerns  the  development  of  “dual  representations”  of  various  functionals,  also  called 
“envelope  representations,”  which  can  yield  major  insights  and  provide  tools  for  characterizing  opti¬ 
mality.  The  functionals  T  may  in  general  take  on  oo  as  a  value  (although  usually  — oo  is  excluded), 
and  some  notation  for  handling  that  is  needed.  The  effective  domain  of  P  is  the  set 


domP=  {  A  E  £2(D)  \F{X)  <  oo}.  (6.3) 

82 Such  measures  P  are  said  to  be  “absolutely  continuous”  with  respect  to  Po. 

88 When  Q.  is  a  discrete  set  of  N  elements,  these  spaces  coincide  and  can  be  identified  with  Tft v . 
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When  T  is  convex,  this  set  is  convex,  but  T  closed  convex  does  not  necessarily  entail  domJ7  also 
being  closed.  The  platform  for  dualization  is  a  correspondence  for  closed  convex  functionals  J7:84  85 


T  :  C?  -*  (—00,00]  closed  convex,  J7  /  00  <^=t-  3Q  •.£?—*  (—00,00],  Q  ^  00,  with 

F{X)  =  sup  {  E[XQ\  -  Q(Q)  }  for  all  X. 

Qec2(n) 

Moreover  the  lowest  such  Q  is  Q  =  J-* ,  where  J-*  is  closed  convex  and  given  by 

E*{Q)  =  sup  {E[XQ}-E(X)}  for  all  Q. 

A'e£2(H) 


(6.4) 


The  functional  T*  is  said  to  be  conjugate  to  J7,  which  in  turn  is  conjugate  to  J7*  through  the  first 
formula  in  (6.4)  in  the  case  of  Q  =  T* ,  namely 


T{X)  =  sup  {  E[XQ\  -  F*(Q)  }  for  all  X.  (6.5) 

Qe£2(Q) 


The  nonempty  convex  set  domJ7*  =  {Q\TF*(Q)  <00}  can  replace  C2(fi)  in  this  formula,  and 
similarly  domJ7  can  replace  £2(fi)  in  the  first  formula  of  (6,4).  Here  are  some  cases  that  will  be 
especially  important  to  us:86 


for  T  closed  convex  ^  00 


{J'(O)  =  0  <=►  inf  X*  =  0, 

E(X)  >  EX  <*=>■  F*{1)  <  0, 

T  is  monotonic  4=^-  Q  >  0  when  Q  £  domJ7*, 

J7  is  pos.  homog.  •«=>  E*(Q)  =  0  when  Q  €  dom  J7*, 


(6.6) 


(where  the  “1”  in  the  second  line  refers  to  the  constant  r.v.  with  value  1.)  The  final  case,  with  positive 
homogeneity,  says  that 

there  is  a  one-to-one  correspondence  between  nonempty,  closed,  convex  sets  Q  C  £2(f l) 

and  closed  convex  pos.  homogeneous  functionals  T  :  £2  — >  (—00,00],  given  by  Cq  >j\ 

E{X)  =  sup  E[XQ]  for  all  X,  Q  =  {Q\  E[XQ]  <  E(X)  for  all  X  }. 

QeQ 


The  second  formula  in  (6.7)  identifies  Q  with  dom  X* .  Any  Q  for  which  the  first  formula  holds  must 
moreover  have  dornJ7*  as  its  closed,  convex  hull. 

Envelope  Theorem87.  The  functionals  J  that  are  the  conjugates  1Z*  of  the  regular  measures  of  risk 
TZ  on  £2(fl)  are  the  closed  convex  functionals  J  with  effective  domains  Q  =  dom  J  such  that 

(a)  EQ  =  1  for  all  Q  G  Q, 

(b)  0  =  J{  1)  <  J{Q)  for  all  Q  G  Q, 

(c)  for  each  nonconstant  X  G  £2(H)  there  exists  Q  6  Q  such  that  E[XQ\  —  EX  >  J{Q). 

The  dual  representation  of  7Z  corresponding  to  J  =  TZ*  is 

77(A)  =  sup  {  E[XQ\  -  J(Q)  }.  (6.8) 

Q&Q 1  J 

84See  Theorem  5  of  Rockafellar  [1974];  this  is  the  case  of  £2(H)  paired  with  itself  through  (X,Q)  =  E[XQ\.  The 
operation  T  — >  T*  is  called  the  Legendre-Fenchel  transform. 

85  Saying  T*  is  “lowest”  means  here  that  every  Q  with  the  indicated  property  satisfies  Q  (Q)  >  for  all  Q  €  C2 

86The  first  is  immediate  from  (6.5)  with  X  =  1,  while  the  second  follows  from  (6.4)  with  Q  =  1.  In  the  third, 
the  sufficiency  comes  from  (6.5),  and  the  necessity  as  well,  because  monotonicity  of  T  precludes  the  existence  of  a 
nonmonotonic  affine  functional  C  with  jC(X)  <  J~(X)  for  all  X.  The  necessity  in  the  fourth  is  clear  from  (6.4)  (because 
positive  homogeneity  allows  only  0  or  00  as  the  supremum);  the  sufficiency  is  obvious  from  (6.5). 

8 'Most  of  the  facts  in  this  compilation,  which  follow  from  the  general  properties  of  conjugacy  as  above,  are  already 
well  understood  and  have  been  covered,  for  instance,  in  Follmer  and  Schied  [2004].  The  new  aspects  are  the  dualization 
of  aversity  in  condition  (c)  and  the  final  assertion,  connecting  with  the  dualization  of  regret. 


36 


Here  7 Z  is  positively  homogeneous  if  and  only  if  J ( Q )  =  0  for  all  Q  6  Q,  whereas  7 Z  is  monotonic  if 
and  only  if  Q  >  0  for  all  Q  G  Q. 

If  V  is  a  regular  measure  of  regret  that  projects  to  IZ,  then  Q  =  {  Q  e  dom  V*  |  EQ  =  1 }  and  the 
conjugate  J  =  IZ*  has  J (Q)  =  V*(Q)  for  Q  G  Q,  hut  J(Q)  =  oo  for  Q  Q. 

The  error  measure  £  paired  with  the  regret  measure  V  has  £*(X )  =  V*(X  +  1).  Likewise,  the 
deviation  measure  V  paired  with  the  risk  measure  IZ  has  T>*(X)  =  7 Z*{X  +  1). 

Risk  envelopes  and  identifiers.  The  convex  set  Q  in  this  theorem  is  called  the  risk  envelope 
associated  with  IZ,  and  a  Q  furnishing  the  maximum  in  (6.8)  is  a  risk  identifier  for  X. 

The  monotonic  case  in  the  theorem  combines  EQ  =  1  with  Q  >  0  and  thereby  allows  us  to 
interpret  each  Q  6  Q  as  a  probability  density  dP/dPo  describing  an  alternative  probability  measure 
P  on  Q.  For  positively  homogeneous  IZ,  the  J(Q)  term  drops  out  of  the  representation  in  (6.8)  (by 
being  0).  The  formula  then  characterizes  IZ{X)  as  giving  the  worst  “cost”  that  might  result  from 
considering  the  expected  values  E[XQ\  =  Ep\X]  over  all  those  alternative  probability  measures  P 
having  densities  Q  in  the  risk  envelope  Q. 

The  nonhomogeneous  case  has  a  similar  interpretation,  but  distinguishes  within  Q  a  subset  Qo 
consisting  of  the  densities  Q  for  which  J (Q)  =  0,  which  always  includes  Q  =  1  (the  density  of  Po  with 
respect  to  itself).  Densities  Q  that  belong  to  Q  but  not  Qo  have  J(Q )  G  (0,oo).  In  (6.8)  that  term 
then  drags  the  expectation  down.  In  a  sense,  J(Q)  downgrades  the  importance  of  such  densities. 

The  conjugates  V*  of  regular  measures  of  regret  V  have  virtually  the  same  characterization  as  the 
conjugates  IZ*  in  the  theorem.  Property  (a)  is  omitted,  but  on  the  other  hand  there  is  a  provision  to 
enforce  the  property  in  (3.15)  (in  the  cases  when  it  is  not  guaranteed  to  hold  automatically).  This 
provision  is  that  V*(C)  <  oo  for  C  near  enough  to  1. 

Some  examples  of  risk  envelopes  in  the  positively  homogeneous  case,  where  (6.8)  holds  with  J(Q ) 
omitted,  are  the  following:88 


IZ{X)  =  EX  +  \o{X)  ►  Q  =  {Q  =  1  +  At|||T||2  <  1,  EY  =  o} 

K(X)  =  CVaRapO  ►  Q  =  {Q\o<Q  <j^,  EQ  =  l} 

IZ(X)  =supX  i — >  Q  =  {Q\Q>0,  EQ  =  1} 

IZ(X)  =  £Li  XklZkfX)  ►  Q  =  {  Q  =  £Lr  A kQk  |  Qk  €  Qk  },  where  IZk  <  >  Qk. 


(6.9) 


Another  illustration  comes  out  of  Example  6,  which  can  now  be  formalized  via  (2.5)  in  terms  of  a 
partition  of  D  into  disjoint  subsets  £lk  of  probability  pk  >  0  with  supfc  X  being  the  essential  supremum 
of  X  on  Qk  and  EkX  being  the  conditional  expectation  E[X\Qk]: 


IZ(X)  =  Y^k=1PkSupkX  i — >  Q  =  |q>0  pk  =  E[Q\£lk\  =  Q(iv)dP0(uj)  j.  (6.10) 


The  risk  envelope  Q  for  the  p-order  superquantile  risk  measure  of  Example  12  has  not  specifically  been 
worked  out,  but  strong  clues  have  been  furnished  by  Dentcheva,  Penev  and  Ruszczynski  [2013].  The 
dual  expression  derived  there  indicates  that  the  risk  envelope  in  this  case  is  a  union  of  risk  envelopes 
for  mixed  quantile  risk  measures  like  (2.7)  (which  are  covered  by  the  second  and  fourth  cases  of  (6.9), 
except  that  finite  sums  need  to  be  replaced  by  general  “continuous”  sums  as  in  (2.8). 

88These  envelopes  were  worked  out  in  Rockafellar  et  al.  [2002];  see  also  Rockafellar  et  al.  [2006a]. 
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Examples  beyond  positive  homogeneity,  where  nonzero  values  of  J  may  enter,  are  simple  to  work 
out  in  the  expectation  case: 

For  quadrangles  in  the  Expectation  Theorem,  with  regret  V(X)  =  E'[u(X)], 
the  conjugate  J  =  7 7*  of  the  risk  measure  7 7  projected  from  V  is  given  by 


J(Q)  =  {  \  for  the  function  v*  conjugate  to  v, 

[  oo  it  EQ  7^  1 


(6.11) 


given  by  v*(q)  =  sup,,.  {  xq  —  v(x)  }.  The  properties  of  v*  corresponding  to 
those  of  v  in  (4.3)  are  that  v*  is  closed  convex  with  u*(l)  =  0,  v*'(V)  =  0. 


This  holds  from  the  description  in  Envelope  Theorem  of  the  J  in  projection  from  V  because  the 
functional  conjugate  to  V(X)  =  E[v(X)\  is  V*(Q )  =  E?[u*((3)].89  The  dualization  of  the  properties  of 
v  to  those  of  v*  comes  from  one-dimensional  convex  analysis;  see  Rockafellar  [1970] . 

An  especially  interesting  illustration  is  furnished  by  Example  8,  where  one  has 


v(x)  =  expx  —  1,  v*(q) 


qlogq  —  q  if  q  >  0, 
oo  if  q  <  0, 


(6.12) 


with  OlogO  =  0,  the  usual  convention.  Through  (6.11)  this  yields 


K(X)  =  log£[expA]  « — >  J(Q)  =  (  EiQlosQ]  if  Q  >  0,  EQ  -  1,  (6.13) 

L  oo  otherwise. 


Here  —J(Q)  is  a  well  known  expression  for  the  relative  entropy  with  respect  to  the  probability  measure 
Po  of  the  probability  measure  P  having  Q  =  dP/dPo90 

Results  in  Rockafellar  [1974]  can  exploit  the  general  dualization  of  77.  to  J  through  Lagrangian 
formats  for  optimization  involving  77  which  generate  dual  problems.  Even  more  powerful  develop¬ 
ments  of  optimization  duality,  tailored  to  the  fine  points  of  financial  mathematics,  have  recently  been 
contributed  by  Pennanen  [2011].  For  more  insights  on  entropic  modeling  versus  risk,  see  Grechuk  et 
al.  [2008],  which  emphasizes  the  role  of  deviation  measures  V  in  place  of  risk  measures  77. 

Also  coming  out  of  the  Envelope  Theorem  is  further  insight  into  the  degree  of  nonuniqueness  of 
the  error  measures  £  that  project  to  a  specified  deviation  measure  77,  or  the  regret  measures  V  that 
project  to  a  specified  risk  measure  77.  In  the  positively  homogeneous  case  of  V,  for  instance,  the 
conjugate  V*  has  by  (6.6),  (6.7),  the  simple  form  that  it  is  0  on  a  certain  closed,  convex  set  7C  but  oo 
outside  of  /C;  then  dornV*  =  /C.  The  theorem  says  the  risk  envelope  Q  determining  the  risk  measure 
77  projected  from  V  has  Q  equal  to  the  intersection  of  7C  with  the  hyperplane  {  Q  \  EQ  =  1 }.  That 
intersection  only  uses  one  “slice”  of  7C.  Different  /C’s  that  agree  for  this  “slice”  will  give  different  V’s 
yielding  the  same  77.  Discovering  a  “natural”  antecedant  V  for  77  therefore  amounts  geometrically  to 
discovering  a  “natural”  extension  7C  of  Q  beyond  the  hyperplane  {  Q  \  EQ  =  1 }. 

The  Envelope  Theorem,  as  presented  here,  is  based  on  duality  theory  in  convex  analysis,  but  the 
idea  of  expressing  preferences  through  functionals  defined  by  a  max  or  min  over  a  set  of  probability 
measures,  as  a  representation  of  distrust  or  ambiguity,  is  far  from  new.  In  finance,  the  concept  is  often 
attributed  to  Artzner  et  al.  [1999],  but  in  statistics  it  can  be  traced  to  Huber  [1981]  and  his  sublinear 
expectation  functionals.  There  is  a  strong  echo  also  in  the  theory  of  preferences  in  economics,  where  a 
minimum  of  expected  utility  over  a  set  of  probability  measures  has  been  explored  from  various  angles. 
For  that  literature,  see  Maccheroni  et  al.  [2006],  Strzalecki  [2011],  and  their  references. 

89This  follows  a  general  rule  of  convex  analysis  in  [Rockafellar,  1974,  Theorem  21],  The  “inner  product”  in  the  function 
space  £2(fi)  is  (X,  Q)  =  E[XQ\. 

90See  Ben-Tal  and  Teboulle  [2007]  for  more  background.  Another  name  for  this  is  Kullbach-Leibler  distance. 
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